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This article discusses the concept of information and its intimate relationship with physics. After 
an introduction of all the necessary quantum mechanical and information theoretical concepts we 
analyze Landauer's principle that states that the erasure of information is inevitably accompanied 
by the generation of heat. We employ this principle to rederive a number of results in classical and 
quantum information theory whose rigorous mathematical derivations are difficult. This demon- 
strates the usefulness of Landauer's principle and provides an introduction to the physical theory of 
information. 
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I. INTRODUCTION 

In recent years great interest in quantum information 
theory has been generated by the prospect of employing 
its laws to design devices of surprising power 0-0| . Ideas 
include quantum computation PMpl, quantum telepor- 
tation (7|^] and quantum cryptography |^^ Jifj|JTl| . In 
this article, we will not deal with such applications di- 
rectly, but rather with some of the underlying concepts 
and physical principles. Rather than presenting very ab- 
stract mathematical proofs originating from the math- 
ematical theory of information, we will base our argu- 
ments as far as possible on the paradigm that informa- 
tion is physical. In particular, we are going to employ 
the fact that the erasure of one bit of information always 
increases the thermodynamical entropy of the world by 
kln2. This principle, originally suggested by Rolf Lan- 
dauer in 1961 JlJJl^], has been applied successfully by 
Charles Bennett to resolve the notorious Maxwell's de- 
mon paradox [ p~3|JT4| ] . In this article we will argue that 
Landauer's principle provides a bridge between informa- 
tion theory and physics and that, as such, it sheds light 
on a number of issues regarding classical and quantum 
information processing and the truly quantum mechan- 
ical feature of entanglement and non-local correlations 



. We introduce the basic concepts both at an informal 
level as well as a more mathematical level to allow a more 
thorough understanding of these concepts. This enables 
us to approach and answer a number of questions at the 
interface between pure physics and technology such as: 

1. What is the greatest amount of classical informa- 
tion we can send reliably through a noisy classical 
or quantum channel? 

2. Can quantum information be copied and com- 
pressed as we do with classical information on a 
daily basis? 

3. If entanglement is such a useful resource, how much 
of it can be extracted from an arbitrary quantum 
system composed of two parts by acting locally on 
each of the two? 

The full meaning of these questions and their answer 
will gradually emerge after explaining some of the un- 
pleasant but unavoidable jargon used to state them. For 
the time being, our only remark is that Landauer's prin- 
ciple will be our companion in this journey. A glance at 
what lies ahead can be readily obtained by inspecting the 
"map" of this paper in Fig. ||. 
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FIG. 1. The essential structure of the article is captured in this diagram. 



A final word on the level of this article: the concepts 
of entanglement and quantum information are of great 
importance in contemporary research on quantum me- 
chanics, but they seldom appear in graduate textbooks 
on quantum mechanics. This article, while making little 
claim to originality in the sense that it does not derive 
new results, tries to fill this gap. It provides an introduc- 
tion to the physical theory of information and the concept 
of entanglement and is written from the perspective of an 
advanced undergraduate student in physics, who is eager 
to learn, but may not have the necessary mathemati- 
cal background to directly access the original sources. 
This pedagogical outlook is also reflected in the choice 
of particularly readable references mainly textbooks and 
lecture notes, that we hope the reader will consult for 
a more comprehensive treatment of the advanced topics 
[fl5| pl| . We also try our best to use mathematics as a 
language rather than as a weapon. Every idea is first mo- 
tivated, then illustrated with a non-trivial example and 
occasionally extended to the general case by using Lan- 
dauers principle. The reader will not be drowned in a 
sea of indices or obscure symbols, but he will (hopefully) 
be guided to work out the simple examples in parallel 
with the text. Most of the subtle concepts in quantum 
mechanics can indeed be illustrated using simple matrix 
manipulations. On the other hand, the choice to actively 
involve the reader in calculations makes this article un- 
suitable for bed-time readings. In fact, it is a good idea 
to keep a pen and plenty of blank paper within reach, 
while you read on. 



II. CLASSICAL INFORMATION ENCODED IN 
CLASSICAL SYSTEMS 



A. The bit 

In this section we will try to build an intuitive under- 
standing of the concept of classical information . A more 
quantitative approach will be taken in section 
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for the full blown mathematical apparatus we refer the 
reader to textbooks, e.g. |2l| . 

Imagine that you are holding an object, be it an array 
of cards, geometric shapes or a complex molecule and we 
ask the following question: what is the information con- 
tent of this object? To answer this question, we introduce 
another party, say a friend, who shares some background 
knowledge with us (e.g. the same language or other sets 
of prior agreements that make communication possible at 
all), but who does not know the state of the object. We 
define the information content of the object as the size 
of the set of instructions that our friend requires to be 
able to reconstruct the object, or better the state of the 
object. For example, assume that the object is a spin-up 
particle and that we share with the friend the background 
knowledge that the spin is oriented either upwards or 
downwards along the z direction with equal probability 
(see fig. |2| for a slightly more involved example). In this 
case, the only instruction we need to transmit to another 
party to let him recreate the state is whether the state 
is spin- up ] or spin-down J,. This example shows that in 
some cases the instruction transmitted to our friend is 
just a choice between two alternatives. More generally, 
we can reduce a complicated set of instructions to n bi- 
nary choices. If that is done we readily get a measure of 
the information content of the object by simply counting 
the number of binary choices. In classical information 
theory, a variable which can assume only the values or 
1 is called a bit. Instructions to make a binary choice can 
be given by transmitting 1 to suggest one of the alterna- 
tive (say arrow up f) and for the other (arrow down 
I). To sum up, we say that n bits of information can 
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be encoded in a system when instructions in the form of 
n binary choices need to be transmitted to identify or 
recreate the state of the system. 

triangle/square 



horizontal/rotated 



horizontal/rotated 




to a resolution of the longstanding Maxwell demon para- 
dox which is really a prime example of the deep con- 
nection between physics and information. The rest of 
the article will then attempt to apply the connection be- 
tween erasure of information and physical heat genera- 
tion further to gain insight into recent results in quantum 
information theory. 



Erasing classical information from classical 
systems: Landauer's principle 



FIG. 2. An example for a decision tree. Two binary choices 
have to be made to identify the shape (triangle or square) and^ a ^ 
the orientation (horizontal or rotated). In sending with equal 
probability one of the four objects, one therefore transmits 2 
bits of information. 



B. Information is physical 

In the previous subsection we have introduced the con- 
cept of the bit as the unit of information. In the course 
of the argument we mentioned already that information 
can be encoded in physical systems. In fact, looking at it 
more closely, we realize that any information is encoded, 
processed and transmitted by physical means. Physical 
systems such as capacitors or spins are used for stor- 
age, sound waves or optical fibers for transmission and 
the laws of classical mechanics, electrodynamics or quan- 
tum mechanics dictate the properties of these devices and 
limit our capabilities for information processing. These 
rather obvious looking statements, however, have signifi- 
cant implications for our understanding of the concept of 
information as they emphasize that the theory of infor- 
mation is not a purely mathematical concept, but that 
the properties of its basic units are dictated by the laws 
of physics. The different laws that rule in the classical 
world and the quantum world for example results in dif- 
ferent information processing capabilities and it is this 
insight that sparked the interest in the general field of 
quantum information theory. 

In the following we would like to further corroborate 
the view that information and physics should be unified 
to a physical theory of information by showing that the 
process of erasure of information is invariably accompa- 
nied by the generation of heat and that this insight leads 



We begin our investigations by concentrating on clas- 
sical information. In 1961, Rolf Landaucr had the impor- 
tant insight that there is a fundamental asymmetry in the 
way Nature allows us to process information |l2| . Copy- 
ing classical information can be done reversibly and with- 
out wasting any energy, but when information is erased 
there is always an energy cost of kTlnl per classical bit 
to be paid. For example, as shown in fig. ^|, we can en- 
code one bit of information in a binary device composed 
of a box with a partition. 



(b) 





i 
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FIG. 3. We erase the information of the position of the 
atom. First we extract the wall separating the two halves of 
the box. Then we use a piston to shift the atom to the left 
side of the box. After the procedure, the atom is on the left 
hand side of the box irrespective of its intial state. Note that 
the procedure has to work irrespective of whether the atom 
is initially on the right (figure (a)) or on the left side (figure 
(b))- 

The box is filled with a one molecule gas that can be 
on either side of the partition, but we do not know which 
one. We assume that we erase the bit of information cn- 
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coded in the position of the molecule by extracting the 
partition and compressing the molecule in the right part 
of the box irrespective of where it was before. We say 
that information has been erased during the compres- 
sion because we will never find out where the molecule 
was originally. Any binary message encoded is lost! The 
physical result of the compression is a decrease in the 
thermodynamical entropy of the gas by kln2. The min- 
imum work that we need to do on the box is kTln2, if 
the compression is isothermal and quasi-static. Further- 
more an amount of heat equal to kTlnl is dumped in the 
environment at the end of the process. 

Landauer's conjectured that this energy/entropy cost 
cannot be reduced below this limit irrespective of how 
the information is encoded and subsequently erased - it 
is a fundamental limit. In the discussion of the Maxwell 
demon in the next section we will see that this principle 
can be deduced from the second law of thermodynamics 
and is in fact equivalent to it [p2f . Landauer's discovery 
is important both theoretically and practically as on the 
one hand it relates the concept of information to physical 
quantities like thermodynamical entropy and free energy 
and on the other hand it may force the future designers 
of quantum devices to take into account the heat produc- 
tion caused by the erasure of information although this 
effect is tiny and negligible in today's technology. 

At this point we are ready to summarize our findings 
on the physics of classical information. 



(a) 



(b) Demondetermines position of atom 



Demon memory 




1) Information is always encoded in a physical system. 

2) The erasure of information causes a generation of kTln2 
of heat per bit in the environment. 



Armed with this knowledge we will present the first 
successful application of the erasure principle: the solu- 
tion of the Maxwell's demon paradox that has plagued 
the foundations of thermodynamics for almost a century. 



D. Maxwell's demon deposed 



1. The paradox 



In this section we present a simplified version of the 
Maxwell's demon paradox suggested by Leo Szilard in 
1929 p^] . It employs an intelligent being or a computer 
of microscopic size, operating a heat engine with a single 
molecule working fluid (figure |j) . 



Gas expandsconverting heat from reservoir to work 



FIG. 4. A schematical picture of Szilard's engine of a box 
filled with a one atom gas. Initially the position of the atom 
is unknown. Then the demon measures the position and de- 
pending on the outcome inserts a piston. Then the gas ex- 
pands and thereby does work on a load attached to the piston. 
This procedure is repeated and we apparently do work at the 
sole expense of extracting heat from one reservoir only. 

In this scheme, the molecule is originally placed in 
a box, free to move in the entire volume V as shown 
in step (a). Step (b) consists of inserting a partition 
which divides the box in two equal parts. At this point 
the Maxwell's demon measures in which side of the box 
the molecule is and records the result (in the figure the 
molecule is pictured on the right-hand side of the parti- 
tion as an example) . In step (c) the Maxwell demon uses 
the information to replace the partition with a piston and 
couple the latter to a load. In step (d) the one-molecule 
gas is put in contact with a reservoir and expands isother- 
mically to the original volume V. During the expansion 
the gas draws heat from the reservoir and does work to 
lift the load. Apparently the device is returned to its ini- 
tial state and it is ready to perform another cycle whose 
net result is again full conversion of heat into work, a 
process forbidden by the second law of thermodynamics. 

Despite its deceptive simplicity, the argument above 
has missed an important point: while the gas in the box 
has returned to its initial state, the mind of the demon 
hasn't! In fact, the demon needs to erase the informa- 
tion stored in his mind for the process to be truly cyclic. 
This is because the information in the brain of the de- 
mon is stored in physical objects and cannot be regarded 
as a purely mathematical concept! The first attempts to 
solve the paradox had missed this point completely and 
relied on the assumption that the act of acquisition of 
information by the demon entails an energy cost equal 
to the work extracted by the demonic engine, thus pre- 
venting the second law to be defeated. This assumption 
is wrong! Information on the position of the particle can 
be acquired reversibly without having to pay the energy 
bill, but erasing information does have a cost! This im- 
portant remark was first made by Bennett in a very read- 
able paper on the physics of computation jj^ ]. We will 
analyze his argument in some detail. Bennett developed 
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Szilard's earlier suggestion [£3| that the demon's mind 
could be viewed as a two-state system that stores one bit 
of information about the position of the particle. In this 
sense, the demon's mind can be an inanimate binary sys- 
tem which represents a significant step forward, as it rids 
the discussion from the dubious concept of intelligence. 
After the particle in the box is returned to the initial 
state the bit of information is still stored in the demon's 
mind (ie in the binary device). Consequently, this bit 
of information needs to be erased to return the demon's 
mind to its initial state. By Landauer's principle this 
erasure has an energy cost 




System 



^^erasure — hTlfll . (1) 

On the other hand, the work extracted by the demonic 
engine in the isothermal expansion is 

W ext racted = + kTln2 . (2) 

All the work gained by the engine is needed to erase the 
information in the demon's mind, so that no net work is 
produced in the cycle. Furthermore, the erasure trans- 
fers into the reservoir the same amount of heat that was 
drawn from it originally. So there is no net flow of heat 
either. There is no net result after the process is com- 
pleted and the second law of thermodynamics is saved! 
The crucial point in Bennett's argument is that the in- 
formation processed by the demon must be encoded in 
a physical system that obeys the laws of physics. The 
second law of thermodynamics states that there is no en- 
tropy decrease in a closed system that undergoes a cyclic 
transformation. Therefore if we let the demon measure 
the Szilard's engine we need to include the physical state 
he uses to store the information in the analysis, otherwise 
there would be an interaction with the environment and 
the system would not be closed. One could also view the 
demon's mind as a heat bath initially at zero tempera- 
ture. After storing information in it, the mind appears 
to an outside observer like a random sequence of digits 
and one could therefore say that the demons mind has 
been heated up. Having realized that the demon's mind 
is a second heat bath, we now have a perfectly acceptable 
process that does not violate the second law of thermo- 
dynamics. 



2. Generalized entropy 



(b) 



Demon 








• 





System + Demon 



FIG. 5. A figure that shows the two different viewpoints 
discussed in this section. The demon is outside the system 
which consists of the box and the atom only (figure (a)) or 
the demon and the box form a joint system that is closed. 

A different approach can be taken if one does not want 
to consider explicitly the workings of the demon's mind, 
but just treat it as an external observer that obtains in- 
formation about the system (see part (a) of figure ||). 
This is done by including in the definition of the entropy 
of the system a term that represents the knowledge that 
the demon has on the state of the system together with 
the well known term representing how ordered the state 
is 

In the context of Szilard's engine we found that the de- 
mon extracts from the engine an amount of work given 
by 

W extr acted = kTln2 = AQ = TAS, (3) 

where AS is the change of thermodynamical entropy in 
the system when the heat AQ is absorbed from the en- 
vironment. On the other hand, to erase his memory he 
uses at least an equal amount of work given by 

Werasure = ~kTln2 = -TI, (4) 



The solution of the paradox presented in the last sec- 
tion views the "brain of the demon" as a physical system 
to be included in the entropy balance together with the 
box that is being observed (see part (b) of figure ||). 



where / denotes the information required by the demon 
to specify on which side of the box the molecule is times 
the scaling factor kln2. In this case the information is 
just 1 bit. The scaling factor is introduced for consistency 
because the definition of information is given in bits as a 



5 



logarithm in basis 2 of the number of memory levels in 
the demon's mind. 

The total work gained (equal to the total heat ex- 
changed Qtotai since the system is kept at constant tem- 
perature T) is thus given by 

Wtotal = Werasure + Wextracted = Qtotai = T(AS — I) = . 

(5) 

This suggests that the second law of thermodynamics is 
not violated if we introduce a generalized definition of 
entropy 9 (in bits) as the difference of the thermody- 
namical entropy of the system AS and the information 
about the system I possessed by an external observer. 



AS — I . 



(6) 



The idea of modifying the definition of thcrmodynam- 
ical entropy that represents an objective property of the 
physical system with an " informational term" relative to 
an external observer appears bizarre at first sight. Physi- 
cal properties like entropy identify and distinguish phys- 
ical states. By introducing a notion as information di- 
rectly in the second law of thermodynamics we somehow 
bolster the view that an ensemble composed of parti- 
tioned boxes each containing a molecule in a position 
unknown to us is not the same physical state than an 
ensemble in which we know exactly on which side of the 
partition the molecule is in each box. Why? Because we 
can extract work from the second state by virtue of the 
knowledge we gained, but we cannot do the same with 
the first. We will encounter similar arguments in later 
sections when we study the concept of information in the 
context of quantum theory. For the time being, we re- 
mark that the approach presented in this section to the 
solution of the Maxwell's demon paradox adds new mean- 
ing to the slogan information is physical. Information is 
physical because it is always encoded in a physical sys- 
tem and also because the information we possess about 
a physical system contributes to define the state of the 
system. 



the alphabet are encoded in the objects, but she does 
not know the message that Bob is sending. When Alice 
receives the objects, she can decode the information in 
the message, provided that none of the objects has been 
accidentally changed on the way to her. Can we quantify 
the information transmitted if we know that each letter 
Pi occurs in the message with probability p{l Let us be- 
gin with some hand-waving which is followed in the next 
section by a formally correct argument. Assume that our 
alphabet is composed of only two letters 1 and occur- 
ring with probability pi = 0.1 and p n = 0.9 respectively. 
Suppose we send a very long message, what is the average 
information sent per letter? Naively, one could say that 
if each letter can be either 1 or then the information 
transmitted per letter has to be 1 bit. But this answer 
does not take into account the different probabilities as- 
sociated with receiving a 1 or a 0. For example, presented 
with an object Alice can guess its identity in 90% of the 
cases by simply assuming it is 0. On the other hand, if 
the letters 1 and come out with equal probability, she 
will guess correctly only 50% of the time. Therefore her 
surprise will usually be bigger in the second case as she 
doesn't know what to expect. Let us quantify Alice's sur- 
prise when she finds letter i which normally occurs with 
probability pi by 



surprise letter i = log- 



1 

Pi 



(7) 



We have chosen the logarithm of j- because if we guess 
two letters, then the surprise should be additive, i.e. 



log(— — ) = log— +log— . 

Pi Pj Pi Pj 



surprise letter i + surprise letter j 



(8) 



and this can only be satisfied by the logarithm. Now we 
can compute the average surprise, which we find to be 
given by the Shannon entropy 



H = Vp 4 log — = - Vp* logK . 

; Pi ; 



(9) 



E. The information content of a classical state in bits 

So far we have discussed how information is encoded in 
a classical system and subsequently erased from it. How- 
ever, we really haven't quantified the information con- 
tent of a complicated classical system composed of many 
components each of which can be in one of n states with 
probability p n . This problem is equivalent to determining 
the information content of a long classical message. In 
fact, a classical message is encoded in a string of classical 
objects each representing a letter from a known alphabet 
occurring with a certain probability. The agreed relation 
between objects and letters represents the required back- 
ground knowledge for communication. Bob sends this 
string of objects to Alice. She knows how the letters of 



This argument is of course hand-waving and therefore 
the next section addresses the problem more formally by 
asking how much one can compress a message, i.e. how 
much redundancy is included in a message. 



1. Shannon's entropy 

In 1948 Shannon developed a rigorous framework for 
the description of information and derived an expression 
for the information content of the message which indeed 
depends on the probability of each letter occurring and 
results in the Shannon entropy. We will illustrate Shan- 
non's reasoning in the context of the example above. 
Shannon invoked the law of large numbers and stated 
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that, if the message is composed of N letters where N is 
very large, then the typical messages will be composed of 
Npi l's and Npo O's. For simplicity, we assume that N 
is 8 and that p\ and po are g and | respectively. In this 
case the typical messages are the 8 possible sequences 
composed of 8 binary digits of which only one is equal to 
1 (see left side of figure 0). 



8 original messages 



0000000 
000000 1 
00000 10 
0000 100 
000 1000 
00 10000 
100000 
1000000 



Compressed messages obtained 
by relabelling sequences 

— ► 000 

— - 00 1 

— >► 1 

— - 1 1 

— ► 100 

— - 1 1 

— ► 110 

— - 1 1 1 

FIG. 6. The idea behind classical data compression. The 
most likely sequences are relabeled using fewer bits while rare 
sequences are discarded. The smaller number of bits still al- 
lows the reconstruction of the original sequences with very 
high probability. 

As the length of the message increases (i.e. TV gets 
large) the probability of getting a message which is all 
l's or any other message that differs significantly from 
a typical sequence is negligible so that we can safely ig- 
nore them. But how many distinct typical messages are 
there? In the previous example the answer was clear: 
just 8. In the general case one has to find in how many 
ways the Npi l's can be arranged in a sequence of N 
letters? Simple combinatorics tells us that the number 
of distinct typical messages is 



N 



Nl 



(N Pl y.(N Po y. 



(10) 



and they are all equally likely to occur. Therefore, we can 
label each of these possible messages by a binary number. 
If that is done, the number of binary digits / we need to 
label each typical message is equal to log 2 - 



— In 

■N pi \Np a \- 111 

the example above each of the 8 typical message can be 
labeled by a binary number composed by / = log 2 S = 3 
digits (see figure ^|). It therefore makes sense that the 
number / is also the number of bits encoded in the mes- 
sage, because Alice can unambiguously identify the con- 
tent of each typical message if Bob sends her the corre- 
sponding binary number, provided they share the back- 
ground knowledge on the labeling of the typical messages. 
All other letters in the original message are really redun- 
dant and do not add any information! When the message 
is very long almost any message is a typical one. There- 
fore, Alice can reconstruct with arbitrary precision the 
original N bits message Bob wanted to send her just by 
receiving I bits. In the example above, Alice can com- 
press an 8 bits message down to 3 bits. Though, the ef- 
ficiency of this procedure is limited when the message is 



only 8 letters long, because the approximation of consid- 
ering only typical sequences is not that good. We leave to 
the reader to show that the number of bits / contained in 
a large iV-letter message can in general be written, after 
using Stirling's formula, as 



-N(pxlogpi + pologpo) 



fill 



If we plug the numbers i and | for p and p\ respectively 
in equation [ll], we find that the information content per 
symbol when N is very large is approximately 0.5436 
bits. On the other hand, when the binary letters 1 and 
appear with equal probabilities, then compression is not 
possible, i.e. the message has no redundancy and each 
letter of the message contains one full bit of information 
per symbol. These results match nicely the intuitive ar- 
guments given above. 

Equation [n] can easily be generalized to an alphabet 
of n letters pi each occurring with probabilities pi. In 
this case, the average information in bits transmitted per 
symbol in a message composed of a large number N of 
letters is given by the Shannon entropy: 



N 



= H{pi\ = - ^pilogpi 



(12) 



We remark that the information content of a compli- 
cated classical system composed of a large number N of 
subsystems each of which can be in any of n states oc- 
curring with probabilities pi is given by N x H{pi}. 



2. Boltzmann versus Shannon entropy 

The mathematical form of the Shannon entropy H dif- 
fers only by a constant from the entropy formula derived 
by Boltzmann after counting how many ways are there to 
assemble a particular arrangement of matter and energy 
in a physical system. 



S = —kln2 ^""^ pjlogpi 



(13) 



To convert one bit of classical information in units of 
thermodynamical entropy we just need to multiply by 
kln2. By Landauer's erasure principle, the entropy so 
obtained is the amount of thermodynamical entropy you 
will generate in erasing the bit of information. 

Boltzmann statistical interpretation of entropy helps 
us to understand the origin of equation |[ Consider 
our familiar example of the binary device in which the 
molecule can be on either side of the partition with equal 
probabilities. An observer who has no extra knowledge 
will use Boltzmann's formula and work out that the en- 
tropy is kln2. What about an observer who has 1 extra 
bit of information on the position of the molecule? He 
will use the Boltzmann's formula again, but this time 
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he will use the values 1 and for the probabilities, be- 
cause he knows on which side the molecule is. After 
plugging these numbers in equation |l3|, he will conclude 
that the entropy of the system is in agreement with 
the result obtained if we use equation ^. The acquisition 
of information about the state of a system changes its 
entropy simply because the entropy is a measure of our 
ignorance of the state of the system as transparent from 
Boltzmann's analysis. 



F. Sending classical information through a noisy 
classical channel 



In the previous section, we found that the Shannon 
entropy measures the information content in bits of an 
arbitrary message whose letters are encoded in classical 
objects. Throughout our discussion, we made an impor- 
tant assumption: that the message is encoded and trans- 
mitted to the recipient without errors. It is obvious that 
this situation is quite unrealistic. In realistic scenarios 
communication errors are unavoidable. To the physicist 
eyes, the origin of noise in communication can be traced 
all the way down to the unavoidable interaction between 
the environment and the physical systems in which each 
letter is encoded. The errors caused by the noise in the 
communication channel cannot be eliminated completely. 
However, one hopes to devise a strategy that enables the 
recipient of the message to detect and subsequently cor- 
rect the errors, without having to go all the way to the 
sender to check the original message. This procedure is 
sometimes referred to as coding the original message. 



1. Coding a classical message: an example 

For example, imagine that Bob wants to send to Alice 
a 1 bit message encoded in the state of a classical binary 
device in which a particle can be on the left hand side 
(encode a 0) or the right hand side (encode a 1) of a hnite 
potential barrier. Unfortunately, the system is noisy and 
there is a probability for the binary letter to flip (i.e. 
1 — > or — > 1). For example, a thermal fluctuation in- 
duced by the environment may cause the particle in the 
encoding device to overcome the potential barrier and go 
from the left hand side to the right hand side. Alice, who 
is not aware of this change, will therefore think that Bob 
attempted to send a 1 and not a 0. This event occurs 
with 1% probability so it is not that rare after all. On 
the other hand, the (joint) probability that two such er- 
rors occur in the same message is only 0.01% (^L x ^gg)- 
Alice and Bob decide to ignore the unlikely event of two 
errors happening in one encoding but they still want to 
protect their message against single errors. How can they 
achieve this? 

One strategy is to add extra digits to the original mes- 
sage and dilute the information contained in it among all 



the binary digits available in the extended message. Here 
is an example. Alice and Bob add two extra digits. Now 
their message is composed of 3 binary digits, but they 
still want to get across only one bit of information. So 
they agree that Alice will read a 1 whenever she receives 
the sequence 111 and a when she receives 000. 

The reader can see that this encoding ensures safer 
communication, because the worst that can happen is 
that Alice receives a message in which not all the digits 
are either 0s or Is, for example 101. But that is not big 
deal. In this case the original message was clearly a 111, 
because we have allowed for single errors only. Under 
this assumption, any original message of the form 000 
can never get transformed in 101 because that requires 
flipping at least two bits. 

This strategy protects the message from single errors 
and therefore ensures that the error rate in the commu- 
nication is reduced down to 0.01% (the probability of 
double errors). By simply adding other two extra bits 
to the encoded message Bob can protect the message 
against double errors and reduce the error rate of two 
orders of magnitude (ie the probability of triple errors). 
Quite obviously one can make the error rate as small 
as possible but at the price of decreasing the ratio of 

bits transmitted — Is it poss ible to achieve a finite 
binary letters employed 

ratio bits transmitted and an arbitrarily small er- 

bmary letters employed J 
ror rate in the decoded messages? We will address this 
question, that has been first answered by Shannon, in the 
next section. 



2. The capacity of a noisy classical channel via Landauer's 
principle 



Maybe surprisingly, one can indeed bring the error rate 
in the received message in communication arbitrary close 
to zero, provided that the actual message of length N bits 
is "coded" in a much longer message of size Nc bits. The 
actual construction of efficient strategies to code a mes- 
sage is a task that requires a lot of ingenuity, but is not 
what we are after. Our concern here is to answer the 
following more fundamental question: 

Given that the probability of error is q, what is the 
largest number of bits N that we can transmit reliably 
through a noisy channel after encoding them in a larger 
message of size Nq bits ? 

In other words we want a bound on the classical infor- 
mation capacity of a noisy channel. We start by remark- 
ing that if the coded message is composed of Nc bits, 
then the average number of errors will be qNc- If we 
let the size of the message be very large, the probability 
of getting a number of errors different from the average 
value becomes vanishing small. In the asymptotic limit 
one will expect exactly qNc bits to be affected by errors 
in the Nc bits message. However, there are many ways 
in which qNc errors can be distributed in the Nc bits of 
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the original message. In fact, we worked out the exact 
number in the section on the Shannon entropy and it is 
given by 



number of ways the errors can be distributed = 



N c 

qN c 
(14) 



The problem there was slightly different, but after 
rephrasing the argument a bit we can conclude that in or- 
der to specify how the qNc errors are distributed among 
the Nc message bits you need n bits of information, 
where n is given by : 



log 



N c 
qN c 

= N c H(q). 



-N c [qlogq+{l-q)log(l 



«)] 

(15) 



The reader should convince himself that equation [L5| 
can be derived following the same steps that led us to 
equation [Hj One just needs to rename the variables. 

The short calculation above may inspire the following 
idea. Bob can send only Nc bits in total and he knows 
that he needs NcH(q) bits to specify the position of the 
errors. All he has to do, then, is to allocate NcH(q) 
binary digits to store the information on the position of 
the errors. At that point the remaining Nc — NcH(q) bi- 
nary digits will be fully available for safe communication. 
Unfortunately, Bob cannot implement this idea directly 
because it requires him to know, in advance, which let- 
ters of the message are going to be affected by errors. 
But the errors are random and they would occur even 
in the letters that supposedly store information on their 
positions! But there is something to be learned from this 
suggestion anyway. 

Suppose, instead, that Bob had diluted the informa- 
tion he wants to transmit among all the letters of the 
message as shown in the last section. When Alice re- 
ceives the string of binary digits and she deciphers the 
message, she gains knowledge of the actual message, but 
also the information necessary to extract the message 
from all the digits. This extra amount of information is 
implicitly provided by the coding technique and it is also 
diluted among all the letters in the message. To see this 
point more clearly, let us use Landauer's principle and 
ask how much entropy Alice generates when she decides 
to erase the message sent by Bob. For simplicity, let us 
stick to our simple example where Bob sends 3 bits to 
effectively transmit only a 1 bit message. In order to 
erase the information sent by Bob, Alice has to reset to 
zero the three classical binary devices sent by Bob and 
that generates an amount of entropy not less than 3kln2, 
by Landauer's principle. But, Alice has effectively ac- 
quired only 1 bit of information corresponding to kln2 
of entropy. So why did she have to generate that ex- 
tra amount of entropy equal to 2kln21 Those extra 2 
bits of information that she is erasing must have been 
implicitly used to identify the errors and separate them 



from the real message. In general, when Alice receives 
the string of Nc binary devices and she erases it, the 
minimum amount of entropy that she generates is equal 
to Nc x kln2. Now we can figure out how much of that 
entropy needs to be wasted to extract the real message 
from these (redundant) string of binary digits. No mat- 
ter how sophisticated Bob's coding was, there is no way 
that Alice could isolate the errors without using at least 
NcH(q) bits of information. In fact, even if she can com- 
press the errors in a block of digits and concentrate the 
message in the remaining block she would still need at 
least NcH(q) binary digits for the errors. Note that we 
are by no means proving that she will be able to achieve 
this efficiency, but only that she will compress the er- 
rors in a block of at least NcH(q) binary letters. But, 
if Alice and Bob could device such a strategy, something 
much more sophisticated than the naive idea suggested 
above, then they would really have Nc — NcH(q) bits 
available for error free communication. That means that 
there is an upper bound on the information capacity of 
any classical noisy channel given by 



N = N c {l-H(q)) 



(16) 



where N is the size of the message effectively transmit- 
ted, Nc is the size of the (larger) coded message and q is 
the probability that each bit will flip under the effect of 
the noise. The rigorous proof that this bound is indeed 
achievable was given by Shannon (see textbooks such as 
PH). The reader interested in more details can consult 
the Feynman lectures on computation on which this short 
treatment was based Q. 

The problem of the noisy channel concludes our survey 
of classical information encoded in classical systems. If 
you have a look at the map of this paper you will see 
that we have gone through one of the 4 columns of topics 
shown pictorially in figure 0. The rest of this paper will 
deal with topics that require a grasp of the basic princi- 
ples and mathematical methods of quantum mechanics. 
The next section is a quick recap that should be of help to 
those with a more limited background. If the reader feels 
confident in the use of the basics of quantum mechanics, 
the density operator and tensor products, then he can 
just skip this part and move on to the next section. 



III. A CRASH COURSE ON QUANTUM 
MECHANICS 



At the end of our discussion on the Maxwell's demon 
paradox, we started putting forward the idea that the 
information we have on the state of a classical system 
contributes to define the state itself. In this section we 
will push our arguments even further and investigate the 
role that the concept of information plays in the basic 
formalism of quantum mechanics. 







A. To be or to know 

The quantum state of a physical system is usually rep- 
resented mathematically by a vector or a matrix p in a 
complex vector space called the Hilbert space |lj.|l9| . 
We will explain the rules and the reasoning behind this 
representation in the next sections by considering two- 
level quantum systems as an easy example that displays 
most of the features of the general case. 

But, first of all, what do the mathematical symbols ex- 
actly represent? In this article, we take the pragmatic 
point of view that what is being represented is not the 
quantum system itself but rather the information that we 
have about its preparation procedure. As an example that 
illustrates this point, we consider the process by which 
an atom prepared in an arbitrary superposition of en- 
ergy eigenstates collapses into only one of the eigenstates 
after the measurement is done. This process seems to 
happen instantaneously unlike the ordinary time evolu- 
tion of quantum states. Generations of physicists have 
been puzzled by this fact and have searched for the phys- 
ical mechanism which causes the collapse of the wave 
function. However, if we consider the wave-function to 
represent only the information that we possess about 
the state of the quantum system, we will definitely ex- 
pect it to change discontinuously after the measurement 
has taken place, because our knowledge has suddenly in- 
creased. Not everybody is satisfied with this view. Some 
people think that physical theories should deal with ob- 
jective properties of Nature, with what is really out there 
and avoid subjectivism. It is difficult to assess the valid- 
ity of these arguments entirely on philosophical grounds. 
To our knowledge there are no experiments that provide 
compelling evidence in favor of any of the existing inter- 
pretational frameworks. Therefore we will adopt what we 
feel is the easiest way out of the problem and explain the 
rules for representing mathematically our knowledge of 
the preparation procedure of an arbitrary quantum state 
& 



B. Pure states and complete knowledge 

1. Pure states of a single system 

We start by considering how to proceed when we have 
complete knowledge on the preparation procedure of a 
single quantum system. In this simpler case, we say that 
the state of the quantum system is pure and we represent 
our complete knowledge of its preparation procedure as 
a vector in a complex vector space. As an example, con- 
sider two non-orthogonal states of a two- level atom j^i) 
and \il>o). These states are arbitrary superpositions of the 
two energy eigenstates. In the next few lines, we show 
how to write them as two 2-dimensional vectors 



2_(1 
J_/2 

M 1 



V5 V 1 



(17) 



IV'o) 



71 |0H 



71' 



(18) 



The rule used above to convert from Dirac to matrix 
notation is to write the energy eigenstates |0) and |1), 

^ and I ? 1 , respectively. 



as the column vectors 



tors 



instead. What is impor- 



o j y 1 

There is nothing mystical behind the choice of this cor- 
respondence. One could have also chosen the basis vec- 

*(0 Md *("i 1 

tant is that the two vectors are orthogonal and normal- 
ized so that they can faithfully represent the important 
experimental property that the two states |0) and |1) are 
orthogonal and can be perfectly distinguished in a mea- 
surement. The important point to observe in the choice 
of the basis in which to represent your state-vectors is 
that of consistency. Every physical quantity has to be 
represented in the same basis when you bring them to- 
gether in computations. If one has used different bases 
for representation, then one has to rotate them into one 
standard basis using unitary transformations. This ro- 
tation can be expressed mathematically as 2 x 2 unitary 
matrix U. A unitary matrix is defined by the require- 
ment that UW = U'U = 1. Given a set of quantities in 
one basis then upon rewriting them in another basis, the 
predictions for all physically observable quantities have 
to remain the same. This essentially requires that the 
mathematical expressions that are used to express these 
observable quantities have to be invariant under unitary 
transformations. We will see examples of this soon. 

Above we have seen examples for orthogonal states 
(namely the basis states |0) and |1), as the column vectors 

q ~\ and ( ^ ~\ ). In general two quantum states will 

be neither orthogonal nor parallel such as for example 
the states |^o) an d l^i)- To quantify the angle between 
two vectors \rpi) and \rpj) we introduce the complex scalar 
product. For complex vectors with two components it is 
given by 

(^# i ) = (a/(0|+6/<l|)(a i |0)+6 i |l)) . 



Qj a i + bj bi 



(19) 



Note that the components of the first vector have to be 
complex conjugated, but apart from that the complex 
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scalar product behaves just as the ordinary real scalar 
product. One nice property of the scalar product is the 
fact that it is invariant under unitary transformations, 
just as you would expect for a quantity that measures 
the angle between two state vectors. 



2. Operators and probabilities for a single system 

In our new language of state vectors, the dot prod- 
uct (ipi\ipj) is analogous to the overlap integral between 
two wave- functions ipi(x) and ipj(x), that is usually en- 
countered in introductory courses of quantum mechan- 
ics. The reader may recall that the squared result of the 
overlap integral, write as KV'ilV'i)! ; can be interpreted 
as the probability of projecting the quantum state 
on the eigenstate \ipj) of an appropriate observable after 
the measurement is performed. 

Now we would like to represent this projection math- 
ematically by a projection operator denoted by \ip)(ip\. 
This projector is simply a matrix that maps all the vec- 
tors onto the vector corresponding to apart from a 
normalization constant. The recipe to construct the ma- 
trix representation of is to multiply the column 
vector times the row vector {tp\ as shown below: 



{a\0) + b\l)){a* 
( a* b* 



b*a\ 



= I a 
b 



\a\ 2 ab* 
a*b \b\ 2 



(20) 



For example, the reader can easily construct the matrix 
representing the projector |1)(1| and check that when it 
operates on the state |f/>o) in equation |l8| we indeed obtain 
the excited state |1) apart from a normalization constant. 
Furthermore, the probability of finding the state in a 
measurement of a system originally in the quantum state 
\<p) is given by 

Prob w = = trmm^M} (21) 

where tr denotes the trace which is the sum of the di- 
agonal elements of a matrix, a concept that is invariant 
under unitary transformations. The reader can easily 
check that Eq. ^l] is true by explicitly constructing the 
matrices \ip)(tp\ and \cf>)((/>\ (see equation ^p|) , multiplying 
them, take the trace, and verify that the result is indeed 
equal to K^|^)| 2 , calculated after squaring the result of 
equation 19[ Once this is done it is easy to write the ex- 
pectation value of any observable whose eigenvalues are 
the real numbers {e^} and its eigenstates are the vectors 
{|ej)}. In fact, if we label the probability of projecting 
on the eigenstate |ej) as Prob\ ei \ and we make use of 
equation Ell, we can indeed write the expectation value 



any observable O of the two level syste 
state " 



for 



an in a given 



as 



(O)i^) = e Prob\ eo) + e x Prob\ ei) . 

= e o ^{|e o )<eo||0><0|} + eiir{|ei){ei||^>(0|} . 
= ir{(eo|eo)(e | + ei]ei><ei|)|0><0|} . (22) 

The expression above can be tided up a bit by defining 
the observable O as the matrix 



O = e |e )(e | + ei|ei)(ei| 



(23) 



Note that in order to use the projectors to calculate prob- 
abilities as in equation 22, we have to demand that the 



sum of the matrices representing the projectors must be 
the unity matrix. For a two dimensional vector space 
this means that |0)(0| + |1)(1| = 1. This condition en- 
sures that the sum of the probabilities obtained using 
equation ^2] is equal to 1. Once we check this important 
property of the projectors we can use equation 23 to con- 



struct the matrix representation of any observable. For 
example, the reader can check that the energy observable 

o ) and ( 1 



E can be written using the basis 
form: 



the 



E = e |e )(e 

- eo {o 

eo 
ei 



■ ei|ei)(ei| 


1 



ei 



(24) 



Note that the energy operator is diagonal in this basis 
because these basis vectors were originally chosen as the 
energy eigenvectors! However, the prescription given in 
equation |2^ to represent any observable O ensures that 
the resulting matrix is Hermitian because the projectors 
themselves are Hermitian. A matrix is said to be Her- 
mitian if all its entries that are symmetrical with respect 
to the principal diagonal are complex conjugate of each 
other (see equation ^0|). The fact that the matrix O is 
Hermitian ensures that its eigenvectors are orthogonal 
and the corresponding eigenvalues arc real. This means 
that the possible "output states" after the measurement 
are distinguishable and the corresponding results are real 
numbers. Once you accept equation [2^, you can imme- 
diately write equation \22 simply as 



(6)=tr{6\i H )(< l p i \} 



(25) 



This completes our quick survey of the rules to represent 
the arbitrary state of a single two level quantum system. 
The main motivation to adopt these rules is dictated by 
their ability to correctly predict experimental results. 
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3. N on- orthogonality and inaccessible information 



4- Two 2-level quantum systems in a joint pure state 



We would like to expand a little bit on the important 
concept of non-distinguishibility between two quantum 
states. By this we mean the following. Suppose that you 
are given two two-level atoms in states l^o) an d \ipi) re- 
spectively (see equations |l8] and [l?]) and you are asked to 
work out which particle is in state \ and which in state 
I i/l'o ) - The two states are said to be non-distinguishable 
if you will never be able to achieve this task without 
the possibility of a wrong answer and if you are given 
only one system and irrespective of the observable you 
to measure. For example you could decide to measure 
the energy of the two atoms. After using equation |2l|or 
just by inspection, you can verify that the probability of 
finding the atom in the excited state if it was in state 
IV'o) before the measurement is equal to |. On the other 
hand, you can also check that the probability of finding 
the atom in the excited state if it was in state j^i) be- 
fore the measurement is also non-vanishing and in fact 
equal to | . Now, suppose that you perform the measure- 
ment and you find that the atom is indeed in the excited 
state. At this point, you still cannot unambiguously de- 
cide whether the atom had been prepared in state |^>o) 
or l^i) before the measurement took place. In fact, by 
measuring any other observable only once you will never 
be able to distinguish between two non- orthogonal states 
with certainty. 

This situation is somehow surprising because the two 
non-orthogonal states are generated by different prepa- 
ration procedures. Information was invested to prepare 
the two states, but when we try to recover it with a single 
measurement we fail. The information on the superposi- 
tion of states in which the system was prepared remains 
not accessible to us in a single measurement. 

It is sometimes argued that we therefore have to as- 
sume that a single quantum mechanical measurement 
does not give us any information. This viewpoint is, how- 
ever, wrong. Consider the situation above again, where 
we either have the state \ipo) or the state \4>i) with a priori 
probabilities 1/2 each. If we find in a measurement the 
excited state of the atom, then it would be a fair guess to 
say that it is more likely that the system was in state 
because this state has the higher probability to yield the 
excited state in a measurement of the energy. Therefore 
the a posteriori probability distribution for the two states 
has changed, and therefore we have gained knowledge as 
we have reduced our uncertainty about the identity of 
the quantum state. 

The non-distinguishability of non-orthogonal quantum 
states is an important aspect of quantum mechanics and 
will be encountered again several times in the remainder 
of this article. 



We have gained a good grasp of the properties of an 
isolated two-level quantum system. We are now going to 
study how the joint quantum state of two such systems 
(say a pair of two level atoms) is represented mathemati- 
cally. The generalization is straightforward. We initially 
concentrate on the situation when our knowledge of the 
preparation procedure of the joint state is complete, i.e. 
when the joint system is in a pure state. The reader who 
is not very familiar with quantum mechanics may won- 
der why we have to include this section altogether. At 
the end of the day, according to classical intuition, the 
state of a joint quantum system comprised of two sub- 
systems A and B can be given by simply providing, at 
any time, the state of each of the subsystems A and B 
independently. This reasonable conclusion turns out to 
be wrong in many cases! Let us see why. 

We first consider one of the most intuitive examples of 
joint state of the two atoms: the case in which atom A is 
in its excited state 11)^4 and atom B in its ground state 
|0)b, where the subscript labels the atoms and the binary 
number their states. In this case, the joint state of the 
two atoms \4>ab) can be fully described by stating the 
state of each atom individually so we write \iPab) down 
symbolically as \1}a \0}b- We call this state a product 
state. We now decide to represent the joint state 1 1)^4 10) b 
of the two atoms as a vector in an enlarged Hilbert space 
whose dimensionality is no longer 2 as for a single atom 
but it is 2 x 2 = 4. The vector representation of |1),4|0)b 
is constructed as shown below: 



AB 




(26) 



Equation [26| defines the so called tensor product between 
two vectors belonging to two different Hilbert spaces, one 
used to represent the state of atom A and the other for 
atom B. For the readers who have never seen the sym- 
bol <8> we write down a more general case involving the 
two vectors \ipA) with coefficients a and b and \iPb) with 
coefficients c and d: 



AB) 



\^a)\^1 
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(27) 



5. Bipartite Entanglement 



The case of tensor product between n dimensional vec- 
tors is a simple generalization of the rule of multiplying 
component-wise as above 



If 



Using equation 27 



the 



reader can work out the vector representation of the fol- 
lowing states: 



|o>a|o> 




(28) 




(29) 



|i>a|i>j 



(30) 



A trick to write the states above as vectors without ex- 
plicitly performing the calculation in equation ^6| is the 
following. First, read the two digits inside |...)|...) as two 
digits binary numbers (for example read |0)|1) as 1), and 
add 1 to get the resulting number n. Then place a 1 
in the n th entry of the column vector and 0s in all the 
others. The four states-vectors in equations [26L |28|, |2^ 
and |3(] are a complete set of orthogonal basis vectors for 
our four-dimensional Hilbert space. Therefore, any state 
\ipAB) of the form \iPb)\'4 , a) in equation p7| can be written 
as: 



10. 



AB 



ac\0) A \0) B + ad\0) A \l) 



where we wrote the vectors symbolically, in Dirac no- 
tation, to save paper. We interpret the coefficients of 
each basis vector in terms of probability amplitudes, as 
we did for single systems. For example, the modulus 
squared |ad| 2 gives the probability of finding atom A in 
its ground state and atom B in the excited state after an 
energy measurement. A question that arises naturally 
after inspecting equation above is the following: 

What happens when I choose the coefficients of the su- 
perposition in equation ^7] in such a way that it is impos- 
sible to find two vectors \<j)a and \/3)b that "factorize" 
the 4~ dimensional vector \iPab) os in equation ^2^? Are 
these non factorizable vectors a valid mathematical repre- 
sentation of quantum states that you can actually prepare 
in the lab? 



The answer to the previous question is a definite yes. 
Before expanding on this point, let us write an example 
of a non factorizable vector: 



■bc\l) A \0)B+bd\l) A \l) 
(31) 



AB 



V2 



|o>a|o)j 



^/2 



1 4 1) 



(32) 



The vector above corresponds to the state for which there 
is equal probability of finding both atoms in the excited 
state or both in the ground state. The reader can perhaps 
make a few attempts to factorize this vector, but they are 
all going to be unsuccessful. This vector, nonetheless, 
represents a perfectly acceptable quantum state. In fact, 
according to the laws of quantum mechanics, ANY vec- 
tor in the enlarged Hilbert space is a valid physical state 
for the joint system of the two atoms, indep endent ly of it 
being factorizable or not. 



VB 2 we will 



In fact, in section 
show that for an n-partite system most of the states are 
actually non factorizable. So these states are the norm 
rather than the exception! 

The existence of non-factorizable states is not too dif- 
ficult to appreciate mathematically, but it leads to some 
unexpected conceptual conclusions. If the quantum state 
of a composite system cannot be factorized than it is im- 
possible to specify a pure state of its constituent compo- 
nents. More strangely perhaps, non-factorizable states, 
such as \4>ab) in equation^ are pure states. This means 
that the corresponding vectors are mathematical repre- 
sentations of our complete knowledge of their prepara- 
tion procedure. There is nothing more we can in prin- 
ciple know about these composite quantum objects than 
what we have written down, but nonetheless we still can- 
not have full knowledge of the state of their subsystems. 
With reference to the discussion following equation |3^, 
we conclude that in a non-factorizable state we have 
knowledge of the correlation between measurements out- 
comes on atom A and B but we cannot in principle iden- 
tify a pure state with each of the atoms A and B individu- 
ally. This phenomena seemed very weird to the fathers of 
quantum mechanics who introduced the name entangled 
states to denote states whose corresponding vectors can- 
not be factorized in the sense explained above. In section 
VI , that is entirely devoted to this topic, we will go be- 
yond the dry mathematical notion of non-factorizability 
and start exploring the physical properties that make en- 
tangled states peculiar. We will focus on possible appli- 
cations of these weird quantum objects in the lab. But 
before doing that, the reader will have to swallow another 
few pages of definition and rules because we have not ex- 
plained yet how to construct and manipulate operators 
acting on our enlarged Hilbert space. 
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6. Operators and probabilities for two systems 

In this section, we generalize the discussion of projec- 
tion operators and observables given previously for single 
quantum systems to systems consisting of two particles. 
The generalization to n-particle systems should then be 
obvious. We start by asserting that the rules stated in 
equations ^l] and 2^ for single quantum systems are still 
valid with the only exception that now observables and 
projector operators are represented by 4 x 4 matrices. 
Imagine that you want to write down the joint observ- 
able Oa <8> Ob where Oa and Ob are possibly different 
observables acting respectively on the Hilbert space of 
particle A and of particle B. The rule to write down the 
joint observable is the following: 



O ab = O 



i <E> B 
oi bi 
a di 

( a x a 2 aib 2 

a\c 2 a\d 2 

c\a 2 c\b 2 

\ cic 2 cid 2 



t>2 
d 2 



b\a 2 b\b 2 

b\c 2 b\d 2 

d]_a 2 d x b 2 

dic 2 did 2 



(33) 



where the subscript 1 denotes the operator on particle A 
and the subscript 2 the operator on particle B. However, 
there are some observables Oab whose cor resp onding 
matrices cannot be factorized as in equation [33|. These 
matrices still represent acceptable observables provided 
that they are Hermitian. 

Furthermore, it is possible to construct projectors on 
any Ad vectors by using the same principle illustrated in 
equation For example, the projector on the entangled 
state \iJj)ab in equation^ can be written as 



\iPab){iP 



AB | 




(10 1) 



/ 1 1 





V i o o i 



(34) 



Finally, suppose you are interested in knowing the prob- 
ability of projecting atom A on its ground state |0)a and 
atom B onto its excited state \1)b after performing a 
measurement on the maximally correlated state \4>ab) 
considered above. How do you proceed? The answer to 
this question should be of guidance also for other cases, 
so we work it out in some detail. The first thing you do 
is to construct the tensor product of the matrices corre- 
sponding to the single particle projectors |0) (0| and |1) (1| 
that project particle A onto its ground state and particle 
B on its excited state: 



|0)(0|®|1)(1| 
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Once you have worked out the matrix in equation p5| you 
can multiply it with the matrix found in equation p4[ and 
take the trace, as explained for single particles in equa- 
tion [2l]. The result is 0, as expected, since we have max- 
imal correlations between the two atoms in state \iPab}- 



C. Mixed states and incomplete knowledge 

1. Mixed states of a single two-level atom 

In this section, we explain how to represent mathemat- 
ically the state of a quantum system whose preparation 
procedure is not completely known to us. This lack of 
knowledge may be caused by random errors in the appa- 
ratus that generates our quantum systems or by fluctua- 
tions induced by the environment. In these cases we say 
that the quantum system is in a mixed state. This can 
be contrasted with the pure states considered in the pre- 
vious sections for which there was no lack of knowledge 
of the preparation procedure (i.e. the quantum states 
were generated by a perfect machine whose output was 
completely known to us). To some extent, by considering 
mixed states, we start dealing with "real world quantum 
mechanics" . We will build on the example introduced in 
section 



. Ill B 1 to make our treatment more accessible. 
An experimentalist needs to prepare two-level atoms in 
the state l^i) to be subsequently used in an experiment. 
He has at his disposal an oven that generates atoms in 
the state \ipx) with probability pi = 95% (see Fig. ^ for 
illustration) . 



Oven 




Detector of 
Experimentalist 



• • • • • " 

\v> i¥> \w> \v> \v> 



D- 



FIG. 7. An oven emits atomic two-level systems. The inter- 
nal state of the system is randomly distributed. With proba- 
bility pi the system is in the pure state |i/>j). A person oblivi- 
ous to this random distribution measures observable A. What 
is the mean value that he obtains? 

In the remaining po = 5% of the cases the oven fails 
and generates atoms in a different state \ipo}- This prepa- 
ration procedure is pretty efficient, but of course still dif- 
ferent from the ideal case. The experimentalist collects 
the atoms, but he does not know for which of them the 
preparation has been successful because the experimental 
errors occur randomly in the oven. Neither can he mea- 
sure the atoms because he is scared of perturbing their 
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quantum state. The only thing he knows is the probability 
distribution of the two possible states. The experimental- 
ist has to live with this uncertainty. However, he his 
aware that, if he uses the states produced by the oven, 
his experimental results are going to be different from 
the ones he would have obtained had he used atoms in 
the state exactly, because the oven occasionally out- 
puts atoms in the undesired state \ipo}- He would like to 
find an easy way to compute the measurement results in 
this situation so he asks a theorist to help him modeling 
his experiments. The first task the two have to face is 
to construct a mathematical object that represents their 
incomplete knowledge of the preparation procedure. In- 
tuitively, it cannot be the vector j^i) because of that 5% 
probability of getting the state \i/jq)- The way the two 
approach the problem is a good example of empirical rea- 
soning, so it is worth exploring their thought process in 
some detail. The theorist asks the experimentalist to de- 
scribe what he needs to do with these atoms and the two 
reach the conclusion that what really matters to them are 
the expectation values of arbitrary observables measured 
on the states generated by the oven. The theorist points 
out that, after performing measurements on N atoms, 
the experimentalist will have used, approximately, Npi 
atoms in the state \ipi) and Np atoms in state \if>o)- F° r 
each of the two states they would know how to cal- 
culate the expectation value for any observable A that 
the experimentalist wants to measure. After using equa- 
tion |23J the theorist rewrites the expectation value of the 
observable A on the state \ipi) as tr{A\tl>i)(tpi\}. The two 
are now only one step away from the result. What they 
need to do is to average the two expectation values for 
the states and |"0o) with the respective probabilities. 
The mean value observed by the experimentalist is thus 
given by: 



(i) = X>M^i><^|} 

i 

= tr{A s £ / Pi\^i){M] 



(36) 



The calculation above can be tided up a bit by defining 
the density operator p as 



P 



(37) 



Once this is done equation |36| can be compactly written 
as 



(A) = tr{Ap} 



(38) 



A glance at these few lines of mathematics convinces the 
two physicists that they have actually solved their prob- 
lem. In fact the density operator is the mathematical de- 
scription of the knowledge the two have about the quan- 
tum states prepared by the oven. Equation 38, on the 



other hand, tells them exactly how to use their knowl- 
edge to compute the expectation value of any operator. 



Similarly, they can write down the probability of find- 
ing the system in any state |er) after a measurement by 
simply constructing the projector \a) (er|. After this, they 
just multiply it with the density operator and take the 
trace (as in equation |2l|) 



Prob\ a) = tr{\a)(a\p} . 



(39) 



Equation |3j provides the recipe for constructing the den- 
sity matrix for the example above. We leave as an ex- 
ercise to the reader to show that the density operator 
representing the preparation procedure described above 
can be written as 



0.785 0.405 
0.405 0.215 



(40) 



One can see that the trace of the density operator p in 
equation ^ is equal to 1. This is not an accident but a 
distinctive property of any density operator. You can eas- 
ily check that by plugging the unity matrix rather than 
the operator O in equation |3^. The expectation value 
of the unity operator on any normalized vector state is 
1 (i.e. the expectation value reduces to the dot product 
of the normalized state vector with itself). That in turn 
implies via equation ^ that the trace of O is 1. 

To sum up, one can use density operators in matrix 
form to represent both states of complete and incomplete 
knowledge (i.e. pure or mixed states). We saw, however, 
that for pure states a vector representation is sufficient. 
If one wants to use the same mathematical tool to write 
down any state irrespective of the knowledge he holds 
on its preparation procedure then the method of choice 
is the density operator (also called density matrix). A 
system is in a pure state when the corresponding density 
operator in equation [37] contains only one term. In this 
ideal case, there is no lack of knowledge on the prepara- 
tion of the system, the preparation procedure generates 
the desired output with unit probability. This implies 
that the diagonalized density matrix representing a pure 
state has all entries equal to zero except one entry equal 
to 1 on the principal diagonal. Therefore, if you take the 
trace of the diagonalized density matrix squared, you will 
still get one. Furthermore, the trace of the diagonalized 
density matrix squared is equal to the trace of the original 
density matrix squared (remember the trace is invariant 
under unitary transformations). This observation is the 
basis of a criterion to check whether a given density ma- 
trix represents a pure or a mixed state. The test consists 
in taking the trace of the density matrix squared. If the 
trace is equal to 1, then the state is pure otherwise it is 
mixed. We recall that a mixed state arises in situation 
when the preparation procedure is faulty and the result 
is a distribution of different outputs each occurring with 
a given probability. 
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2. Mixed states for two quantum systems 



3. The reduced density operator 



Our treatment of density operators for single quantum 
systems can be applied to bipartite systems with no es- 
sential modification. Let us consider an example in which 
an experimental apparatus produces the maximally en- 
tangled state \iPab) (see equation ^2|) with probability po 
and the product state |0)a|0)b with probability pi . For 
both states we know how to construct the correspond- 
ing projectors by using the same method illustrated in 
equation |34|. But, before writing down the resulting den- 
sity operator, we introduce a small simplification in the 
notation used. We write the state |0)^|0)_b simply as 
1 00) ab or simply |00). The rule to write down the four- 
dimensional vector corresponding to this state and its in- 
terpretation does not change. The first digit still refers to 
atom A and the second to atom B. We can now write the 
corresponding density operator pab as shown in equation 



Pab = Po\*Pab)(iI>ab\ 
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_ Po [ 1 1 
2 110 

Vo o o o 



•Pi|00)(00| 









Po Po 
2 2 

PP. 



Ell 

2 





2 






1 
















































(41) 



There is another situation that will arise in later sec- 
tions. Suppose that two distant machines are generating 
one atom each, but we do not know exactly the prepara- 
tion procedure of each atom. Since the two machines are 
very far away from each other, we can ignore the inter- 
action between the atoms and describe them separately 
in two different 2-dimensional Hilbert spaces by writing 
down the corresponding single particle density operators 
Pa and ps- All this is fine. But, we may also write the 
joint state of these two non-interacting atoms as a den- 
sity operator pab in our 4-dimensional Hilbert space, as 
we did for the case considered in equation |d]. How do 
we proceed? We simply take the tensor product between 
the two 2x2 matrices corresponding to pa and ps to get 



Pab = PA® Pb 



(42) 



We leave as an exercise for the reader to choose two ar- 
bitrary density operators pa and ps and perform an ex- 
plicit calculation of pab ■ 

Once we know how to write 1) the density matrix for 
the joint state of the two atoms and 2) the matrix rep- 
resenting a joint observable or projector we will have no 
trouble finding expectation values or probabilities of cer- 
tain measurement outcomes. All we need to do is to mul- 
tiply two 4x4 matrices and take the trace as illustrated 
for a single particle in equations and p9. 



There is another context in which a mixed state arises 
even when there is no uncertainty in the preparation pro- 
cedure of the quantum system one is holding. Imagine 
you have an ideal machine that generates, with proba- 
bility one, pairs of maximally entangled particles in the 
state \iPab) = ^(lOO) + 1 1 1 ) ) . The density operator pab 
for this pure state reduces to the corresponding projec- 
tor, because all the probabilities except one are vanishing 
see discussion at the end of section III CI. In fact, the 



4x4 density matrix for this preparation procedure was 



explicitly calculated in equation 34. 

After having created the entangled pair we decide to 
lock particle A in a room to which we have no access and 
we give particle B to our friend Bob. Bob can do any 
measurement he wants on particle B and he would like 
to be able to predict the outcomes of any of these. Evi- 
dently Bob does not know what is happening to particle 
A after it has been locked away and as a consequence 
now he has an incomplete knowledge of the total state. 
The question is how we can describe mathematically his 
state given the incomplete knowledge that Bob has of 
particle A. The first point to make is that Bob still has 
some background knowledge on particle A because he re- 
tains information on the original preparation procedure 
of the entangle pair. For example, he knows that if Alice 
subjects her particle to an energy measurement and finds 
that particle A is in the ground (excited) state, then par- 
ticle B has to be in the ground (excited) state too. This 
prediction is possible because the measurement outcomes 
of the two particles are always correlated because they 
were prepared in the entangled state \iPab}- Further- 
more, Bob knows from the preparation procedure, that 
the probability that Alice finds her particle in either the 
ground state \0)a or in the excited state |l)yi is ^. By 
using the non local correlations between his particle and 
the other, Bob concludes that particle B too is in either 
the ground state \0)b or in the excited state \1)b with 
probability \. Now let us assume that Alice indeed has 
measured the energy operator on her particle but, as she 
is inside the box, has not told Bob that she did this. 
Therefore, in half the cases Bob's particle will be in state 
|0)(0| and in half the cases it will be in state |1)(1|. This 
is a situation that is most easily described by a density 
operator. We find that the state of Bob's particle is de- 
scribed by the reduced density operator ps given by: 



PB 



|0)(0| + -|1)(1 

1 
1 



(43) 



where we used the rules for the representation and manip- 
ulation of quantum states as vectors (equation |2C|). From 
the above reasoning it is perhaps not surprising that pb 
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is often termed the reduced density operator. Being a 
mixed state, it represents Bob's incomplete information 
on the state of his particle (the reduced system) due to 
his inability to access particle A while the total system is 
in a pure entangled state represented by the larger ma- 
trix pab- In fact, Bob wrote down ps after taking into 
account all information that was available to him. It is 
important to note that we would have obtained the same 
result for Bob's density operator if we had assumed any 
other operation on Alice's side. The key point is that, as 
Alice's actions do not affect Bob's particle in any phys- 
ically detectable way, it should not make any difference 
for Bob's description of his state which assumptions he 
makes for Alice's action. 

The whole operation of ignoring Alice's part of the sys- 
tem and generating a reduced density operator only for 
Bob's system is sometimes written mathematically as 



Pa = tr B {pAB} 



(44) 



The mathematical operations that one has to perform on 
the entries of the larger matrix pab m order to obtain 
Pa are called the partial trace over system B. The gen- 
eral case can be dealt with analogously to the reasoning 
above. One assumes that in the inaccessible system a 
measurement is carried out whose outcomes are not re- 
vealed to us. We then determine the state of our system 
for any specific outcome from the projection postulate 
and we use the associated probabilities to form the ap- 
propriate density operator. We refer the reader interested 
in learning how to deal with this method in the most ef- 
ficient way to some recent courses of quantum mechanics 

EM- . 

This topic concludes our very concise review of quan- 
tum mechanics. We will now extensively apply the math- 
ematical tools introduced in this section to deal with 
situations in which classical information is encoded in 
a quantum system and later to discuss the new field of 
quantum information theory. It is therefore essential that 
the reader feels confident with what he has learned so far 
before moving on. 



IV. CLASSICAL INFORMATION ENCODED IN 
QUANTUM SYSTEMS 

A. How many bits can we encode in a quantum 
state? 

In the previous section, we studied two situations in 
which the state of a quantum system is mixed, namely 
when the preparation procedure is not completely known 
or when we have a subsystem that is part of a larger inac- 
cessible system. In both cases, our knowledge was limited 
to the probabilities {pi} that the system is in one of the 
pure states \tpi). A question that arises naturally in this 
context is whether we can assign an entropy to a quan- 
tum system in a mixed state in very much the same way 



as we do with a classical system that can be in a num- 
ber of distinguishable configurations with a given set of 
probabilities. In the classical case the answer is the well 
known Boltzmann formula given in equation 
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At first 

sight, you may think that the same formula can be ap- 
plied to evaluate the mixed state entropy just by plugging 
in the probabilities {pi} that the quantum system is in 
one of the pure quantum states Unfortunately, this 
idea does not work, because the quantum states \ipi) are 
different from the distinguishable configuration of a clas- 
sical system in one important way. They are not always 
perfectly distinguishable! As we pointed out earlier, two 
quantum states can be non-orthogonal and therefore not 
perfectly distinguishable. But maybe the idea of starting 
from the classical case as a guide to solve our quantum 
problem is not that bad after all. 

In particular, imagine that you are given the density 
matrix representing the mixed state of a quantum sys- 
tem. Can you perform some mathematical operations on 
this matrix to bring it in a form that is more suggestive? 
You may recall from equations |2(] and |37] that the proce- 
dure to write down this density matrix is the following. 
First construct the matrix representation of the projector 
\4'i)('4'i\ for each of the vectors |^), then multiply each 
of them by their respective probability and finally sum 
all up in one matrix. The reader can check that the pre- 
scription on how to construct each matrix l^)^! given 
in equation ensures that the resulting density matrix 
is Hermitian. Wc denote the orthogonal eigenvectors of 
our (hermitian) density matrix by | e^) . If we choose the 
|ej) as basis vectors, we can rewrite our matrix in a di- 
agonal form. All the entries on the diagonal are the real 
eigenvalues of the matrix. These matrices can now be 
written in Dirac notation as 



(45) 



where the qi are now the eigenvalues of the density ma- 
trix. This new matrix actually represents another prepa- 
ration procedure namely the mixed state of a quantum 
system which can be in any of the orthogonal states \ei) 
with probability qi. But now the states | e^) are distin- 
guishable and therefore one can apply the Boltzmann 
formula by simply plugging the eigenvalues of the matrix 
as the probabilities. 

There is one problem in this reasoning. When you 
rewrite the old density matrix in diagonal form you are 
actually writing down a different matrix and therefore a 
representation of a different preparation procedure. How 
can you expect then that the entropy so found applies to 
the mixed state you considered originally? The answer to 
this question lies in the fact that what matters in the ma- 
trix representation of quantum mechanical observables or 
states is not the actual matrix itself, but only those prop- 
erties of the matrix that are directly connected to what 
you can observe in the lab. From the previous section, we 
know that all the physically relevant properties are basis 
independent. The diagonalization procedure mentioned 
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above is nothing else than a change of basis and therefore 
there is no harm in reducing our original density matrix 
p in diagonal form and hence define the von Neumann 
entropy as the function 



S(p) = -tr{plogp} 



(46) 



The formula above is an example of how a function of 
a matrix can be evaluated as an ordinary function of 
its eigenvalues only. Since the eigenvalues are invariant 
under a change of basis the function itself is invariant, 
as expected. One can check the validity of the formula 
above as an entropy measure by considering two limit- 
ing cases. Consider first a pure state, for which there is 
no uncertainty on the output of the preparation proce- 
dure. The probability distribution reduces to only one 
probability which is one. Therefore the density matrix 
representing this state has eigenvalue equal to one. If 
you plug the number one in the logarithm in formula ^ 
you get the reassuring result that the entropy of this state 
is zero. On the other hand, for a maximally mixed state 
in which the system can be prepared randomly in one 
of N equally likely pure state we find that the entropy 
is logN in agreement (in dimensionless units) with the 
Boltzmann and Shannon entropies. 

There is an interesting point to note. If we create a 
mixed state by generating the states {IV 7 *)} with proba- 
bilities {pi} we first hold a list of numbers which tell us 
which system is in which quantum state. In this classical 
list each letter holds H({pi}) bits of information. If we 
want to complete the creation of the mixed states, we 
have to erase this list and, according to Landauer's prin- 
ciple, will generate kTH({pi}) of heat per erased message 
letter. In general the Shannon entropy is larger than or 
equal to the von Neumann entropy of the density oper- 
ator p = K is also clear that the same 
mixed state can be created in many different ways and 
that the information invested into the state will not be 
unique. It seems therefore unclear whether we can as- 
cribe a unique classical information content to a mixed 
state. However, the only quantity that is independent 
of the particular way in which the mixed state has been 
generated is the von Neumann entropy which is different 
from the amount of information invested in the creation 
of the mixed state. In fact, the von Neumann entropy 
S(p) is the smallest amount of information that needs to 
be invested to create the mixed state p. As we are unable 
to distinguish different preparations of the same density 
operator p this is certainly the minimum amount of clas- 
sical information in the state p that we can access. The 
question is whether we can access even more classical in- 
formation. The answer to this question is NO, as we will 
see in the next section in which we generalize Landauers 
principle to the quantum domain to illuminate the situ- 
ation further. The result of these considerations is that 
there is a difference between information that went into 
a mixed state, and the accessible information that is left 



after the preparation of the states . 



B. Erasing classical information from quantum 
states: Landauer's principle revisited 

In the previous subsection we have discussed the 
amount of classical information that goes into the cre- 
ation of a mixed state. But an obvious question has not 
been discussed yet: how do you erase the classical infor- 
mation encoded in a quantum mixed state ? In section 
II C , we explained how to erase one bit encoded in a par- 



titioned box filled with a one molecule gas. All you have 
to do in this simple case is to remove the partition and 
compress the gas on one side of the box (say the right) 
independently of where it was before. This procedure 
erases the classical state of the binary device and the 
bit of information encoded in it. If the compression is 
carried out reversibly and at constant temperature, then 
the total change of thermodynamical entropy is given by 
kln2, the minimum amount allowed by Landauer's prin- 
ciple. In this sense the erasure is optimal. What we are 
looking for in this section is a procedures for the erasure 
of the state of quantum systems. We will first present 
a direct generalization of the classical erasure procedure 
and then follow this up with a more general procedure 
that applies directly to both classical and quantum sys- 
tems. These results will then be used to show that the 
accessible information in a quantum state p created from 
an ensemble of pure states is equal to S(p). 



1. Erasure involving measurement 

We know from the previous section that the informa- 
tion content of a pure state is zero. Therefore, all we 
need to do to erase the information encoded in a mixed 
quantum state, is to return the system to a fixed pure 
state called the standard state. We show how to achieve 
this in the context of an example. 

Imagine you want to erase the information encoded in 
quantum systems in the mixed state p — X)iPil e i)( e i| 
where the je,) are the energy eigenstates. You start by 
performing measurements in the energy eigenbasis . Af- 
ter the measurement is performed, each system will in- 
deed be in one of the pure states | e^) and we have a clas- 
sical record describing the measurement outcomes. If the 
density operator represents the preparation procedure of 
two level atoms and we measure their energies, the clas- 
sical measurement record would be a set of partitioned 
boxes storing a list of Os and Is labeling the energy of 
the ground state or the excited state for each atom mea- 
sured. Now we can apply a unitary transformation and 
map the state |e») onto the standard state |eo) for each 
atom on which a measurement has been performed (see 
first step in figure @) . 
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FIG. 8. Particles described by a quantum state p arrive 
and are being measured in a basis |ej) giving the outcome 
i with probability pi. Given the outcome the each of the 
particles can be rotated into the pure state |eo). The re- 
maining classical list has to be erased as well. This generates 
kTln2H ({pi}) of heat. This procedure can be optimized if one 
measures in the eigenbasis of p in which case one generates 
kTln2S{p) heat. 

Naively, one could think that this completes the era- 
sure, because we have reset the quantum systems to 
a fixed standard state |eo). However this is not true, 
because we are still holding the classical measurement 
records so the erasure is still not complete. We need 
one more step namely to erase the classical measurement 
record using the classical procedure discussed above. In 
the example of figure ^ , this amounts to compressing 
each of the partitioned boxes where the list of Os and Is 
were encoded. This process will generate an amount of 
thermodynamical entropy not less than kln2 per bit. In 
general we have that k\n2S(p) < kln2H(p) as pointed 
out in the previous section. The optimal erasure proce- 
dure, ie the one that creates the least amount of heat, 
is the one where the quantum measurements are made 
in the basis of the eigenstates of p, so that the Shannon 
entrop y equa ls the von Neumann entropy as discussed in 
section 



IV A 



To sum up, the protocol described above relies on a 
quantum mechanical measurement followed by a unitary 
transformation and the erasure of the classical measure- 
ment record. While this protocol is a perfectly acceptable 
erasure procedure, it consists of two conceptually differ- 
ent steps and one may wonder whether there is a simpler 
method that does not involve the explicit act of measur- 
ing the quantum system. 



2. Erasure by thermal randomization 

Such an elegant way to erase information, which has 
been introduced by Lubkin |3q,En], is by thermal ran- 



domization. Simply stated, you have to place the quan- 
tum system that is to be erased into contact with a heat 
bath at temperature T. The laws of statistical mechanics 
teach us that when thermal equilibrium is reached, there 
will be an uncertainty about the energy state the system 
is in. The origin of this uncertainty is classical because 
it is induced by thermal fluctuations. This situation of 
lack of knowledge of the preparation procedure for the 
quantum state is eq uivalent to the example of the oven 
considered in section III C 1 . The state of the system can 
therefore be written as a density operator u> given by 



-pH 



z 



0Et 



(47) 



where (3 = 1/kT, H is the Hamiltonian of the system 
whose eigenstates and eigenvalues are |ej) and Ei respec- 
tively. The number Z is the partition function of the 
system and can be calculated from Z = tr{e~P H }. For 
example, the system can be in its ground state with prob- 
ability pq given by the Boltzmann distribution: 



-0E O 



Po 



(48) 



The exponential dependence of the probabilities in the 
equation above implies that, if the system has a suffi- 
ciently large level spacing (ie Eq is much smaller than 
the other energy levels), it will be almost surely in its 
ground state. Thus, if a measurement is made, the re- 
sult will be almost certainly that the apparatus is in its 
ground state. In other words, the mixed state p can be 
made arbitrarily close to a standard pure state |eo) by 
greatly reducing the presence of the other pure states 
\ei) in the thermal preparation procedure. In practice, 
this is exactly what we wanted: a procedure that always 
resets our system, originally in the mixed state p, to a 
standard state (independent of the initial state), eg the 
ground state |erj). Also note that this erasure procedure 
never requires any measurement to be performed, so we 
do not need to be concerned with erasing the classical 
measurement record, as in the previous method. 

Furthermore, we can readily calculate the net amount 
of thermodynamical entropy generated in erasing the 
quantum mixed state p where the classical information 
is encoded. We proceed by computing first the change 
of thermodynamical entropy in the system and then the 
change of thermodynamical entropy of the environment. 
All the steps in this derivation are reproduced and moti- 
vated. The readers who do not feel comfortable with the 
formalism of density operators explained in the previous 
sections can skip this derivation and jump to the result 
in equation |54|. 

The mixed state p is generated by a source that pro- 
duces randomly pure states |ej) with probability pi. Each 
quantum system in such a pure state |e») is brought into 
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contact with the heat bath and thermalizes into the state 
Q (see figure ^|). 



Therm alization 




FIG. 9. The quantum particles, described by the average 
state p, are brought into contact with a thermal heat bath and 
are allowed to relax into thermal equilibrium. The resulting 
change of heat depends on the temperature of the heat bath 
and its optimal value is given by kTln2S(p) 

We remind the reader that the entropy of the system 
before the thermalization procedure takes place is zero 
because the system is in one of the pure states \ei) (see 
equation ^6] and discussion below). Therefore, in each of 
these contacts, the thermodynamical entropy of the sys- 
tem increases by the same amount kln2S(6j), where S(ui) 
is the von Neumann entropy times the conversion factor 
between information and thermodynamical entropy, so 
that we have 



AS sys = kln2S(uj) . 



(49) 



Now we proceed to discuss the change in the thermody- 
namical entropy of the heat bath. The latter is given 
in terms of the heat lost by the heat bath and its tem- 
perature T by the well known thermodynamical relation 
ASbath = A jr th . The easiest way to attack this prob- 
lem is by using the observation that the change of heat 
in the heat bath AQbath is equal and opposite to the 



change of heat in the system AQ 



system • 



The latter is 



given in terms of the heat lost by the system and the 
temperature of the reservoir by the well known thermo- 
dynamical relation TAS sys tem = AQ system . Further- 
more, the first law of thermodynamics can be used to 
write AQ system as the change in the internal energy of 
the system AU syst ern = Uf ina i - U ini u a i (i.e. the proce- 
dure can be done reversibly so that the work required is 



arbitrary close to 0). One can summarize what is stated 
above in the equation: 



AS ba 



AU 



th 



system 



u 



final 



u. 



initial 



T 



T 



(50) 



We can now rewrite the initial and final energy of the 
system as the expectation value of the Hamiltonian H of 
the system calculated in the initial state p and in the final 
thermal state &). The formula to use is given in equation 
|38| . Once this is done equation ^ can be recast in the 
following form: 



ASbath — 



tr{ujH) - tr{pH} 
T 

tr{(u - p)H} 
T 



(51) 



The expression in equation |5l] can be further elaborated 
by substituting the operator H with the corresponding 
expression —kTln(Zuj) obtained after solving the first 
equation in ^ with respect to H . 

AS ba th = ktr{(u - p)ln(Zu>)} 

= ktr{(uj — p)lnuj} + klnZtr{(uj — p)lnuj} . (52) 

In the previous steps we used the properties of logarithm 
and the fact that a constant like InZ or kT can be " taken 
out of the trace" . The last term in equation ^ vanishes 
because tr{p} = tr{uj} — 1 because the trace of a density 
operator is always equal to 1. Also the first term can be 
expanded as 



ASbath = ktr{u)lnu)} - 
= -kln2S{Cj) 



ktr{plnili} 
- ktr{plnuj} 



(53) 



Note the factor ln2 to convert the logarithm from the 
natural basis to the basis 2 adopted in the definition of 
the Von Neumann entropy. We therefore reach the final 
result that the total change of thermodynamical entropy 
in system and environment in our procedure is given by 



AS fnt = AS, 



ASbath = -ktr{plnu)}, 



(54) 



where Co is the state of the system after having reached 
thermal equilibrium with a heat bath at temperature T. 

This entropy of erasure can be minimized by choosing 
the temperature of the heat bath such that the thermal 
equilibrium state of the system is p, i.e. 



min{AS to t} = S(p) = -tr{plogp} , 



(55) 



which equals the von Neumann entropy of p. Equation 
po] restates Landauer's principle for quantum systems in 
which classical information is encoded. 

From the last section we remember that the amount of 
classical information invested in the creation of the state 
p was never smaller than the von Neumann entropy S(p) 
a value which can always be achieved. This left open 
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the question how much classical information is actually 
still accessible after the creation of p. Having seen above, 
that the entropy of erasure of a quantum state p can be 
as small as the von Neumann entropy we conclude from 
Landauer's principle, that the accessible information in 
the state p cannot be larger than its von Neumann en- 
tropy. Therefore it becomes clear that the only possible 
quantity to describe the classical information content of 
a mixed state that has been prepared from an ensemble 
of pure states is given by the von Neumann entropy. 



C. Classical information transmitted through a noisy 
quantum channel 



In this section we will evaluate how much classical in- 
formation can be transmitted reliably down a noisy quan- 
tum channel. The reader may remember that we consid- 
ered the classical analogue of this problem in section [I F . 

Imagine that Alice wants to transmit a message to Bob. 
This message is written in an alphabet composed of N 
letters a% each occurring with probability pi. Alice de- 
cides to encode each letter ai in the pure quantum state 
\tpi). Alice can transmit the letter ai simply by send- 
ing a particle in the state \ipi) via a physical channel, 
like an optical fiber. When Bob receives the particle, he 
does not know which pure state it is in. Bob's incom- 
plete knowledge of the state of the particle is represented 
by the mixed state p — X^IV'iXV'il- When Bob reads 
the state of the particle he will have gained some use- 
ful information to guess which letter Alice had encoded. 
The information encoded in the mixed state of the quan- 
tum carrier is equal to the von Neumann entropy S(p) 
as explained in the last section. If the states are or- 
thogonal, then the von Neumann entropy reduces to the 
Shannon entropy of the probability distribution {pi} be- 
cause all the quantum states are distinguishable and the 
situation is analogous to the classical case. If the states 
are non-orthogonal then the von Neumann entropy will 
be less that the Shannon entropy. The information trans- 
fer is degraded by the lack of complete distinguishability 
between the pure states of the carriers in which the infor- 
mation was encoded at the source. This feature has no 
classical analogue and is sometimes referred as intrinsic 
quantum noise. The name is also justified by the fact 
that this noise is not induced by the environment or any 
classical uncertainty about the preparation procedure of 
the carriers' states. 

We now wonder what happens when the channel itself 
is noisy (see figure |l0|). For example, the optical fiber 
where the carriers travel could be in an environment or 
an eavesdropper, Eve, could be interacting with the car- 
riers. This extra noise is not intrinsic to the preparation 
of the pure states at the source, but it is induced by the 
environment. One can view the transmission through a 
noisy channel in the following way. 



a 00- 
b 01- 
c 10- 
d 11- 



Alice 



Bob 



IV ® 



T ll 




Loss of information to environment 
Second index lost 



FIG. 10. The basics of information transmission. Al- 
ice encodes the letters a,b,c,d (which can also be encoded in 
binary as 00,01, 10, 11) and encodes them in pure quantum 
states ipij). These states are sent through the channel where 
the environment interacts with them. Here the information 
about the second index is lost leading to mixed states po and 
pi. Bob receives these mixed states and has lost some of the 
original information as he cannot distinguish between a and 
b and between c and d. 

Initially the sender, Alice, holds a long classical mes- 
sage. She encodes letter i (which appears with probabil- 
ity pi) of this message into a pure state that, during the 
transmission, is turned into a possibly mixed quantum 
state pi due to the incomplete knowledge of the environ- 
ment or of Eve's actions. These quantum states are then 
passed on to the receiver, Bob, who then has the task to 
infer Alice's classical message from these quantum states. 
The upper bound for the capacity for such a transmis- 
sion, i.e. the information / that Bob can obtain about 
Alice's message per sent quantum state, is known as the 
Holcvo bound 



I = I H = S(p)-Y,PiS(pi) 



(56) 



The rigorous proof of this result is rather complicated 
complicated. J39|. The aim of the next section is to 
justify Holevo's bound from the assumption of the va- 
lidity of Landauer's principle. 



1. Holevo's bound from Landauer's principle 

The idea behind the derivation of the Holevo bound 
from Landauer's principle is to determine an upper 
bound on the entropy that is generated when Bob erases 
the information that the message system carries in its 
state pi. In this way we directly obtain an upper bound 
on the information received by Bob, because we know 
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from Landauer's principle that the information received 
is always less or equal to the entropy generated when it 
is erased (see equation ^) . 

Let us begin by assuming that Alice uses an alphabet 
of letters (i, a) that are enumerated by the two integers 
i and a. We use this form of double indices to make 
formulation of the following analysis simpler, but apart 
from that it has no deeper meaning. The letter i appears 
with probability pi and given i, a appears with the prob- 
ability r l a . Alice encodes her message in the following 
way. Given she wants to send letter (i, a) which occurs 
with probability Pi-r l a , she encodes it into the pure state 
\4> % a ). Therefore pi = Y, a r V\4>a) {4>aY Now these quan- 
tum states are inserted into the quantum channel and 
they are subjected to an interaction with the environ- 
ment or an eavesdropper Eve. The effect of this inter- 
action is that the systems loose their correlation to the 
specific values of a or in other words, the information 
about a is lost, and we are left with a certain degree of 
correlation between the integers i and the mixed states 
Pi. Evidently the lost information about a has leaked 
into the environment or to Eve and this information is 
not available to Bob anymore. In the following we would 
like to compute, using Landauers principle, how much 
information has actually been lost. To this end we con- 
struct an optimal erasure procedure and compute the 
thermodynamical heat it generates. 



2. Direct erasure 

As explained above message letter (i, a) which appears 
with probability pi ■ r l a is encoded in state | We will 
now delete the information encoded in these pure state by 
bringing them into contact with a heat bath. We chose 
the temperature of this heat bath such that the thermal 
equilibrium state of the message system is p = J^iPiPi- 
This ensures that the erasure is optimal, in the sense that 
it produces the smallest possible amount of hea t. Fol low- 
ing the analysis of Lubkin's erasure in section IV B , the 
entropy of erasure is given by 



amount of information lost to the environment or the 
eavesdropper. 

For a fixed i which appears with probability pi, we 
place the encoded pure states into contact with a heat 
bath. The temperature T of the heat bath is chosen such 
that the thermal equilibrium state of the message system 
is pi. Again this choice ensures that the erasure is opti- 
mal. According to our analysis of the Lubkin erasure in 
section IV B, the entropy of erasure is then found to be 



AS« = -E^irKICKCIlogpJ 

i a 
= - ^2 Pitr{pi log Pi} 

i 

= Y,Pi S (Pi) ■ (58) 

i 

After this first step in the erasure procedure there is still 
some information left in the physical systems as the let- 
ter i of the classical message is correlated with the state 
Pi of the quantum system. Therefore some information 
is available to Bob. In fact, this is exactly the situation 
in which Bob is after he received a message which is en- 
coded as in mixed states pi. To obtain a bound on the 
information that Bob is now holding, we need to find a 
bound on the entropy of erasure of his quantum systems. 

Now we would like to determine the entropy of erasure 
of the signal states pi that Bob has received through the 
channel. In order to carry out this second step of the 
erasure procedure we place each of Bob's systems, which 
is in one of the states pi with probability pi, into contact 
with a heat bath such that the thermal equilibrium state 
of the message system is p. As the average state of the 
systems is p = J^iPiPii we expect the erasure to be op- 
timal again. We can see easily that this second step of 
erasure, just generates an amount of entropy that is the 
difference between the entropy of erasure of the first pro- 
cedure and that of the first step of the second procedure. 
Therefore the entropy of erasure of Bob's systems which 
are in one of the states p^s is 



AS er (Bob) 



(57) 



s (p) - ^2pt s (pi) 



(59) 



Note that all information has been deleted because now 
every quantum system is in the same state p so that there 
is no correlation between the original letter i and the en- 
coded quantum state left after the erasure! 



As the largest possible amount of information available 
to the receiver Bob is bounded by his entropy of erasure 
we have 



I < AS er (Bob) = S(p) - J2p* s iPi) = Ih 



(60) 



3. Two step erasure 

Now let us compute the entropy of erasure in going 
from the pure states \(f> l a ) into which Alice encoded her 
message initially to the mixed states pi that Bob obtains 
after the carriers have passed the channel. This is the 
first step in our erasure procedure and determines the 



Therefore we have obtained the Holevo bound on the in- 
formation in the states pi which appear with probabilities 
Pi. The Holevo bound completes our answer to the first 
of the three questions posed in the introduction. This is 
the last result that we prove in this article about classi- 
cal information. We now turn our attention to the newly 
developed subject of quantum information theory. 
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V. THE BASICS OF QUANTUM INFORMATION 
THEORY 

The concept of quantum information represents a rad- 
ical departure from what we have encountered so far. In 
the next few sections, we will explore some of its proper- 
ties by using Landauer's erasure principle. But first we 
want to discuss why the term quantum information has 
been introduced and what exactly it means. 



A. Quantum information: motivation of the idea 

The choice of the bit as the fundamental unit of in- 
formation is reasonable both logically and physically. In 
fact, right from the outset, our definition of information 
content of an object has focused on the fact that informa- 
tion is always encoded in a physical system. Classically, 
the simplest physical system in which information can 
be encoded is a binary device like a switch that can be 
either open (1) or closed (0). However, as technology 
shrinks more and more, we need to abandon the macro- 
scopic world in favor of devices that are sufficiently small 
to deserve the name of quantum hardware. To some ex- 
tent, the quantum analogue of a classical binary device 
is a two level quantum system like a spin-half particle. 
Just as the classical device, it possesses two perfectly dis- 
tinguishable states (spin-up and spin-down) and as such 
it is the simplest non-trivial quantum system. However, 
it differs in one important way from the classical switch. 
The general state of a spin-half particle can be in an 
arbitrary superposition of the state | |)z corresponding 
to the spin of the particle being oriented upwards, say 
in the positive z direction, and of the state | [) z corre- 
sponding to the spin oriented downwards: 



(61) 



where a and (3 are two arbitrary complex numbers such 



that lal 2 



1. a 



are the probabilities for 



finding the particle spin-up or spin-down in a measure- 
ment of the spin along the z direction. By analogy with 
the classical bit, we define a qubit as the information en- 
coded in this two-level quantum system. An example will 
elucidate the motivation behind this definition. 

Imagine that you are holding a complex quantum sys- 
tem and you want to send instructions to a friend of yours 
so that he can reconstruct the state of the object with 
arbitrary precision. We have previously mentioned that, 
if the necessary instructions can be transmitted in the 
form of n classical bits, then the classical information 
content of the object is n bits. Sending n bits of clas- 
sical information is not difficult. We just need to send 
a series of n switches and our friend will read a when 
the switch is closed and a 1 when it is open. He will 
then process this information to recreate the state of a 
complex quantum object like n interacting spin-^ parti- 
cles. All this is fine, but it entails a number of problems. 



Firstly the set of instructions may be very large even if 
we only want to recreate a single qubit simply because 
the complex amplitudes are real numbers. More impor- 
tantly though, we are somewhat inconsistent in trying to 
reduce the state of a quantum system to classical binary 
choices. It would be more logical to transmit the quan- 
tum state of the composite object by sending "quantum 
building blocks" . For example, we could try to send our 
instructions directly in the form of simple two level quan- 
tum systems (qubits) rather than bits encoded in classi- 
cal switches. The hope is that, if we prepare the joint 
state of these qubits appropriately, our friend will be able 
to manipulate them somehow and finally reconstruct the 
state of the complex quantum object. Ben Schumacher 
p6| , ^5| proved that this is indeed possible and he also 
provided a prescription to calculate the minimum num- 
ber of qubits m that our friend requires to reconstruct 
an arbitrary quantum state. The existence of this proce- 
dure allows us to establish an analogy with the classical 
case and say that the quantum information content of 
the object is m qubits. In this sense, the qubit is the ba- 
sic unit of quantum information in very much the same 
way as the bit is the unit of classical information. We 
ask the reader to b e patient and wait for later sections, 
namely section V C , in which we will explain in more de- 



tail Schumacher's reasoning and expand on some of the 
remarks made above. The previous arguments should 
anyway convince the reader that, although the ideas of 
qubit and bit have a common origin, it is worth exploring 
the important differences between the two. 



B. The qubit 

The key to understand the differences between quan- 
tum and classical information is the principle of super- 
position. Our discussion below will be articulated in two 
points. We first assess the implications of the superposi- 
tion principle for the state of a single spin-half particle (1 
qubit) and then we move to consider the case of a quan- 
tum system composed of n spin- half particles (n qubits). 



1. A single qubit 

The concept of superposition of states, that plays a 
crucial role in the definition of the state of a spin-half 
particle has no analogue in the description of a classical 
switch which is either in one state or in the other, but 
not in both! Naively, one could think that the proba- 
bilistic interpretation of the coefficients a and (3 in the 
superposition of states given by equation ^l] solves all the 
problems. In fact, if \a\ 2 and \(3\ 2 are the probabilities for 
finding the particle spin-up or spin down after the spin is 
measured along the z direction, then a qubit is nothing 
more than a statistical bit. That is a random variable, 
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which can be either or 1 with given probabilities \a\ 2 
or |/3 1 2 respectively. This conclusion is wrong! 

The probabilistic interpretation of equation ^ given 
above is not the full story on the qubit because it con- 
centrates only on the modulus squared of the complex 
numbers a and (3. This amounts to throwing away some 
degrees of freedom that are contained in the imaginary 
entries. We have shown before that the qubit is mathe- 
matically described by a vector in a two dimensional com- 
plex vector space (the Hilbert space). This state vector 
can be visualized as a unit-vector in a three-dimensional 
space, ie. pointing from the origin of the coordinate sys- 
tem to the surface of a unit sphere, known as the Bloch 
sphere [|l5|,^7j (see figure |Tl]b). 



(a) 



Classical bit 



(b) 



Quantum bit 





FIG. 11. The Bloch sphere representation of (a) a classi- 
cal bit in which the vector can only point up or down; (b) a 
qubit in which the vector is allowed to point in any direction. 
This illustrates that a qubit possesses more freedom than a 
classical bit when information is processed. 

This can be contrasted with a classical bit which is 
simply a discrete variable that can take up either of the 
values or 1. A classical bit is thus shown in the same 
diagram as a unitary vector along the z axis, pointing 
either up or down (see figure 11a). This makes intuitive 
the idea that to some extent there is "more room for 
information" in a qubit than in a bit. However, the abil- 
ity of the qubit to store more information in its "larger 
space" is limited to the processing of information. It is 
in fact impossible to fully access this information (ie. the 
whole of the spherical surface) in a measurement. More 
explicitly, whenever we manipulate a spin-up particle we 
do act on all its degrees of freedom (ie. we change both 
the amplitude and the relative phase of the two complex 
coefficients a and /3) so that the vector representing the 
qubit can be rotated freely on any point on the surface of 
the sphere. However, when we try to measure the state 
of the system we have to choose a basis (ie. a direction) 
in which the spin measurement has to be done. That 
amounts to fixing a direction in space and asking only 
whether the projection of the vector state in that direc- 
tion is oriented parallel or anti-parallel. In other words 
when we try to extract information from the spin-half 
particle we never recover a full qubit (ie. the qu antum 
state of the system) . We know from section III B 3 that it 



is impossible to extract the complex coefficients a and f3 
with a single measurement. In fact, the information one 
can extract from the measurement is just one classical 
bit. It is remarkable to note, that there is a large frac- 
tion of information in a qubit that can be processed but 
not accessed in a measurement. Therefore, the difference 
between a single qubit and a classical bit is not merely 
quantitative, as figure 11 suggests, but also qualitative. 



2. n qubits 

We have hopefully clarified what is meant by a qubit. 
We will now expand on our knowledge of quantum in- 
formation by explaining what people mean by having or 
transmitting n qubits. We already know that n qubits 
is nothing more than a fancy way of saying n two level 
quantum systems. So the point is really to understand 
the features displayed by the joint system of n two level 
quantum systems, possibly interacting with one-anothcr. 
In section [II B 5, we saw that, when you abandon the 



safe territory of single particle quantum mechanics, you 
immediately stumble over the remarkable phenomena of 
quantum entanglement that make the quantum descrip- 
tion of a composite object very different from its classi- 
cal description. Please note that we are not contrasting 
macroscopic objects obeying the laws of classical physics 
(say three beams of light), with microscopic objects obey- 
ing the laws of quantum mechanics (say three photons). 
Instead, we are remarking that even if you choose macro- 
scopic objects, say three beams of light, and you decide 
never to mention the word photon, you will still be able 
to come out with states of the joint macroscopic system 
that are entangled and therefore completely beyond clas- 
sical intuition. Let us be even more explicit. Imagine 
that you have a classical physicist right in front of you 
and you ask him the following question: 

You: How many complex numbers do you need to 
provide in order to specify the joint state of a system 
comprised of three polarized beams of light? 

The classical physicist will probably find the expres- 
sion joint state rather peculiar, but he will still answer 
your question on the basis of his knowledge of classical 
electrodynamics. 

Classical physicist: To completely describe the state 
of a composite system (ie. one composed of many subsys- 
tems) you just need to specify the state of each subsystem 
individually. So if you have n arbitrary polarized light 
beams, you need In complex numbers to describe com- 
pletely the joint system, 2 complex parameters for each 
of the n systems. In fact the state of each beam of light 
can be described by a superposition of say horizontally 
and vertically polarized components. 



= A v e l6v \V) +A H e v 



\H) 



(62) 



What we mean is only to prepare a beam of light in 
a superposition of horizontally and vertically polarized 
components. Instructions given in this form should be 
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understandable by a classical physicist, too. Further- Hilbert space is large! 



more the two complex coefficients in equation |62j can be 
interpreted as follows: Ay and Ah are the moduli of the 
amplitude, corresponding to the field strength, and Oy 
and 6u are the phases of the vertically and horizontally 
polarized components. An example is light that is polar- 
ized at a 45 degree angle, which can also be viewed as an 
equally weighted superposition of horizontally and verti- 
cally polarized light with the same phase. The descrip- 
tion of three such beams of light will obviously require 
2x3 complex parameters. 




Classical state space is much smaller! 

FIG. 12. Schematic picture of the whole Hilbert space, in- 
cluding entangled states, and the smaller space comprising 
only the disentangled states expected by a classical physicist. 



Unfortunately, statements that seem obvious some- 
times turn out to be wrong. The reader, who remembers 
our discussion of entanglement in section |III B 5 , may 
see where the problem with the argument above lies. In 
order to describe an n-partite object quantum mechan- 
ically, you need an enlarged Hilbert space spanned by 
2™ orthogonal state vectors. For example the joint state 
of three beams of light is an arbitrary superposition of 
the 2 3 orthogonal state vectors, and therefore requires 8 
complex coefficients, not 6. Why 8? Consider the state 
vector \HHV) representing the state in which the first 
and the second beams are horizontally polarized whereas 
the third is vertically polarized. Here we used H and V, 
rather than 1 and as in section III B 5 , but the logic 
is the same. How many of those vector states can you 
superpose? Well, each of the three entries in |...) can be 
either H or V so you have 2x2x2 possibilities. Therefore 
any quantum state can be written as the superposition of 
these 8 vectors in an 8 dim ensiona l Hilbert space. How- 

not every vector can 



ever, as we saw in section III B 5 



be factorized in three 2-dimensional vectors each describ- 
ing a single beam of light. If he insists on using only 6 
parameters to describe a tripartite system, the classical 
physicist will ignore many valid physical states that are 
entangled! You may wonder how big that loss is. In other 
words, how much of the Hilbert space of a n-partite sys- 
tem, is actually composed of entangled states. The an- 
swer is pretty straightforward. Product states predicted 
by classical thinking "live" in a subspace of dimension 
2 x n, whereas the dimension of the whole Hilbert space 
for the joint state of n beams of light has 2™ dimen- 
sion. Formally stated, the phase space of a quantum 
many body system scales exponentially with the number 
of components if you allow for entanglement among its 
parts. The classical product states instead occupy only 
an exponentiall y s mall fraction of its Hilbert space as 
shown in figure [L2L 



Going back to our starting point, we say that we are 
able to hold and manipulate n qubits when we can pre- 
pare and keep n beams of light, n two- level atoms or n 
spin-i particles in a joint state \ip) given by any arbitrary 
superposition of the 2™ state vectors which can take the 
form 
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with 2™ complex amplitudes a^...^. The actual prepa- 
ration of such a state presents a tremendous experimen- 
tal task no matter which constituents subsystems you 
choose. You need to carefully control and "engineer" 
the interaction among all the constituent components to 
choose the state you want and at the same time you have 
to protect the joint state against environmental noise. 
To date, this is possible with only a few qubits and 
many people are skeptical about radical improvements in 
the near future. The prospect of implementing quantum 
computation, that requires manipulation of many qubits 
to be effective, seems far beyond present capabilities. 



C. The quantum information content of a quantum 
system in qubits 

We want to make up for the pessimistic tone that ended 
the last section with the discussion of an interesting fea- 
ture of quantum information that might be useful in case 
devices based on quantum information theory are ever 
built. We will explain how an arbitrary quantum state 
of a composite system comprised of n interacting 2-level 
atoms, can be compressed and transmitted by sending a 
number m < n of qubits. As advertised in chapter 3, 
this procedure justifies the use of the qubit as the unit 
of quantum information and by analogy with classical 
data compression partly justifies the otherwise mislead- 
ing name qubit. We proceed in close mathematical anal- 
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ogy to the classical case studied in section HE and see 
how well we can compress quantum states, ie. how many 
qubits are needed to describe a quantum state. We first 
give a simple example, that illustrates the key ideas, and 
then we reiterate these ideas in a slightly more general 
and formal way. 



1. Quantum data compression: a simple example 

Let us begin with the following very simple exam- 
ple, which is in fact essentially classical, but displays 
all the relevant ideas of the more general case. Con- 
sider a quantum source that emits two-level systems with 
probability po = 0.95 in state |0) and with probability 
pi = 1—po = 0.05 in the orthogonal state |1). Our knowl- 
edge of this preparation procedure for a single qubit is 
represented by the density operator p given by 



p = 0.95|0)(0| + 0.05|l)(l| 



(64) 



Note, that the two states generated by the oven have 
been chosen to be orthogonal for simplicity. We will con- 
sider the more general case later. For the time being, let 
us consider blocks of 7 qubits generated by the source 
described above. Clearly any sequence of qubits in states 
|0) and |1) is possible, but some are more likely than 
others. In fact, typically you will find either a sequence 
that contains only qubits in state |0) or sequences with 
a single qubit in state |1) and all others in state |0), as 
shown below: 
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The probability that you will get one of the above se- 
quences is pukeiy = (0.95) 7 -I- 7(0.95) 6 (0.05) = 0.955. Of 
course, these 'typical' states can be enumerated using 
just three binary digits, i.e. 3 binary digits are sufficient 
to enumerate 95.5% of all occurring sequences. This pro- 
cedure is analogous to labeling the typical sequences of 0s 
and Is shown in figure |6| except that we now 'enumerate' 
the typical sequences with 'quantum states'. Now, let us 
see how we can use this fact quantum mechanically. We 
define a unitary transformation that has the following 
effect: 

[7|0)|0)|0)|0)|0)|0)|0) = |0)|0)|0)|0)|0)|0)|0) 
[7|0)|0)|0)|0)|0)|0)|1) = |0)|0)|0)|0)|0)|0)|1) 
CT|0)|0)|0)|0)|0)|1)|0) = |0)|0>|0)|0)|0)|1)|0) 



t/|0)|0)|0)|0)|l)|0)|0> = |0)|0)|0)|0)|0)|l)|l> 
C7|0)|0)|0)|l)|0)|0)|0) = |0)|0)|0)|0)|l)|0)|0) (66) 
CT|0)|0)|1)|0)|0)|1)|0) = |0)|0)|0>|0)|1)|0)|1) 
CT|0)|1)|0)|0)|0)|1)|0) = |0)|0)|0)|0)|1)|1)|0) 
CT|1)|0)|0)|0)|0)|1)|0) = |0)|0)|0)|0)|1)|1)|1) . 

In this case the unitary transformation is a matrix that 
maps a set of 8 orthogonal column vectors on another set 
of 8 orthogonal vectors in a complex vector space of di- 
mension 2 7 . The effect of this unitary transformation is 
to compress the information about the typical sequences 
into the last three qubits, while the first four qubits are 
always in the same pure state |0) and therefore do not 
carry any information. However, when U acts on other, 
less likely, sequences it will generate states that have 
some of the first four qubits in state |1). Now comes 
the crucial step, we throw away the first four qubits and 
obtain a sequence of three qubits: 
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Therefore we have compressed the 7 qubits into 3 qubits. 
Of course we need to see whether this compression can be 
undone again. This is indeed the case, when these three 
qubits are passed on to some other person, this person 
then adds four qubits all in the state |0) and then ap- 
plies the inverse unitary transformation U~ 1 and obtains 
the states in equation |6(] back. This implies that this 
person will reconstruct the correct quantum state in at 
least 95.5% of the cases and he has achieved this sending 
only 3 qubits. As we showed in the classical case (see 
equation |l^), in the limit of very long blocks composed 
of n qubits, our friend will be able to reconstruct almost 
all quantum states by sending only n_ff(0.95) = 0.2864n 
qubits. Note that this procedure also works when we 
have a superposition of states. For example, the state 

=a|0)|0)|0)|0)|0)|0)|0)+/3|0)|0)|0)|0)|0)|0)|l) (68) 

can be reconstructed perfectly if we just send the state 
of three qubits given below: 



M=a|0>|0>|0>+/?|0>|0>|1) 



(69) 



Therefore not only the states in equation |66| are re- 
constructed perfectly, but also all superpositions of these 
states. 

A very similar procedure would work also when we 
have a source that emits quantum states with prob- 
abilities Pi, giving rise to an arbitrary density operator 
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p = J2iPi\*Pi)(*Pi\- Unlike the example in equation |64|, 
the states \ipi) can be non — orthogonal states of a two 
level system so the resulting density matrix is not in di- 
agonal form. In this slightly more complicated case, the 
first step consists in finding the eigenvectors and eigen- 
values of p. As the eigenvectors to different eigenvalues 
are orthogonal, we are then in the situation of equation 
|64| . We can immediately see that the number of qubits 
that need to be sent, to ensure that the probability with 
which we can reconstruct the quantum state correctly is 
arbitrarily close to unity, is given by n times the Shannon 
entropy of the eigenvalues of p which is in turn equal to 
the von Neumann entropy S(p). Since we can reconstruct 
the quantum state p® n of a system composed of n qubits 
by sending only nS{p) qubits, we say that nS(p) is the 
quantum information content of the composite system. 



2. Quantum data compression via Landauer's principle 

One may wonder whether the efficiency of quantum 
data compression can be deduced from Landauers prin- 
ciple and indeed this is possible. Given a source that 
generates \ipi) with probabilities pi, and gives rise to a 
densi ty operator p = J^iPil^i/i^il we know from section 
1VB that the entropy of erasure per qubit is given by 
S(p)kln2. Now let us assume that we could compress the 
quantum information in state p® n to n(S(p) — e) qubits 
where e > 0. The state of each of these qubits will be 
the maximally mixed state Cj = ||0)(0| + because 
otherwise we could compress it even further. We can 
then calculate the entropy of erasure of the n(S(p) — e) 
qubits in state Cj and find of course n{S{p)~e)S{Cj)kln2 = 
n{S{p) - e)H{\)kln2 = n(S(p) ~ e)kln2. Therefore the 
total entropy of erasure would be given by the total num- 
ber of qubits times the entropy of erasure for the qubits 
n(S(p)—e)xkln2 which is less than nS(p)kln2. This how- 
ever, cannot be, because Landauer's principle dictates 
that the entropy of erasure cannot be less than S(p)kln2 
if the compressed states should hold the same amount of 
information as the uncompressed states. Therefore, we 
arrive at a contradiction which demonstrates that the ef- 
ficiency of quantum data compression is limited by the 
Von Neumann entropy S'(p), as classical data compres- 
sion is limited by the Shannon entropy. This is the an- 
swer to the first part of the second searching question in 
H We still need to find out whether this similarity be- 
tween classical and quantum information extends also to 
the act of copying information. 



D. Quantum information cannot be copied 

In this section, we use Landauer's erasure principle to 
argue that unlike classical bits qubits cannot be copied. 
This result is often termed the no-cloning theorem. The 
basis of our arguments is a reductio ad absurdum. We 



show that if Bob can clone an unknown state sent to him 
by Alice, then he can violate Landauer's principle. The 
logical steps of this argument are discussed below in the 
context of an example. 

1. Alice starts by encoding letter and 1, occur- 
ring with equal probabilities, in the non-orthogonal 
states \ipo) and jV'i) 
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We can find the upper bound to the information 
transmitted per letter by using Landauer's princi- 
ple. As discussed in section IV A[ the minimum 
entropy of erasure generated by thermalisation of 
the carriers' states is given by S(p) where p repre- 
sent the incomplete knowledge that we have of the 
state of each carrier: 



p = |l^i><^il 
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After working out the matrix corresponding to p 
and plugging it in the formula ^ for the Von Neu- 
mann entropy, we find that the entropy of erasure 
and therefore the information is equal to 0.6008 
bits. This is less than 1 bit because the two states 
are non-orthogonal and the von Neumann entropy 
is less that the Shannon entropy of the probability 
distribution with which the states are chosen, i.e. 
H{\) = log2 =1 bit. 

2. Alice sends the message states to Bob who has the 
task to decipher her message. Bob is also informed 
of how Alice encoded her letters (but of course he 
does not know the message!) and uses this infor- 
mation in his guess. No matter how clever Bob is, 
he will never recover more information than what 
Alice encoded (i.e. more than 0.6008 bits). 

3. Now let us assume that Bob owns a machine that 
can clone an arbitrary unknown quantum state and 
he uses it to clone an arbitrary number of times 
each of the message-states Alice sends to him. 

4. However, if Bob can clone the state of the message- 
system, then, upon receiving any of the two states 
|t/>o) or \tpi) he can create a copy. Since the prob- 
ability of receiving each state is |, Bob will end 
up holding either two copies of the first j^oMV'o) 
or two copies of the second state . We can 
compute the density operator that describes this 
situation following the rules described in section 
[II: 
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Ptwocopies = ^|'0o)|V , o)(V ; o|(^o| + £ | tpl) | V>1> {tpl \{l/>l | 

(73) 



The density operator ptwocopies is represented a 4x4 
matrix. After finding the eigenvalues of this ma- 
trix we can calculate its Von Neumann entropy 
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This is a measure of the classical 



information that Bob has about the letter received 
after cloning. We find: 



S(p, 



twocopies 



) = 0.8113 > 0.6008 



(74) 



Therefore the information content of the state has 
increased and if we would push this further and 
create infinitely many copies, then Bob would per- 
fectly distinguish between the two non-orthogonal 
states and he could extract one bit of information 
per letter-state received. This, however, is not pos- 
sible as we cannot extract more info than Alice has 
originally encoded. 



The no-cloning theorem represents one of the most 
striking differences between classical and quantum infor- 
mation. We therefore conclude this section on quantum 
information by completing our answer to the second ques- 
tion posed in the introduction. Quantum inform ation 
can be compressed in the sense described in section V C , 
but it cannot be copied as we routinely do with classical 
information. 



VI. ENTANGLEMENT REVISITED 

In the last section, we have always encountered the 
concept of entanglement as one of the central theme in 
quantum information theory. However, we never system- 
atically addressed the question of what physical prop- 
erties make entangled states peculiar and how they can 
be engineered and exploited for practical purposes in the 
lab. We now embark on this task. Our approach here will 
be based on worked out examples. We have chosen the 
same approach and numerical examples as in reference 
p7[ , so that the reader who masters the topics presented 
here can easily jump to a more comprehensive and math- 
ematical treatment. Throughout the following sections, 
we concentrate exclusively on bipartite entanglement for 
which a sufficient understanding has been reached. 



A. The ebit 



In section III B 5| , we saw that any arbitrary superposi- 
tion of the basis vectors ( 1 01) , |11), |00), 1 10} ) represents 
the physical state of a bipartite system. So that must be 
true also for the vector \ctab) given by: 



where a and f3 are two arbitrary complex numbers such 
that |cn| 2 + |/3| 2 = 1. We quickly remind the reader that, 
according to the rules of quantum mechanics, \a\ 2 is the 
probability for finding the first system in |0) and the sec- 
ond in state |1) after a measurement, whereas |/?| 2 is the 
probability of finding the first system in state |1) and the 
second |0). The states of systems A and B are clearly 
anti-correlated. But this is not the whole story. 

We remind the reader that what is remarkable about 
\uab) is that it is impossible to write it as a product 
state. The state \o~ab) is represented by a vector in the 
enlarged Hilbert space Hab that cannot be factorised as 
the tensor product of two vectors in Ha and Hb ■ There- 
fore, we reach the conclusion that \o~ab) does represent 
the state of a bipartite system, but we cannot assign a 
definite state to its constituent components. In fact, even 
the terminology constituent components is a bit mislead- 
ing in this context. We emphasize that the systems A 
and B can be arbitrary far from each other but never- 
theless constitute a single system. The entanglement of 
the bipartite state \o~ab) is then a measure of the non- 
local correlations between the measurement outcomes for 
system A and system B alone. These correlations are the 
key to the famous Bell inequalities and origin of much 
philosophical and physical debate ]3l[ | and more recently 
the basis for new technological applications 

A basic question that arises in this context is how 
much entanglement is contained in an arbitrary quan- 
tum state? A general answer to this question has not 
been found yet, although quite a lot of progress has been 
made 0,|32|,|35|,[54j . In this article we confine ourselves 
to the simplest case of bipartite entanglement for which 
an extensive literature exists. As a first step we define 
the unit of entanglement for a bipartite system as the 
amount of entanglement contained in the maximally cor- 
related state: 
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We call this fundamental unit the ebit in analogy with the 
qubit and the bit. Note that this state differs from the 
maximally correlated state \4>ab) in equation only 
by a local unitary transformation and should therefore 
contain the same amount of entanglement. The reason 
behin d the name ebit will be clear after reading section 
VI D, where we explain how to turn any multipartite 



entangled systems into a group of m ebits plus some 
completely disentangled (product) states, just by using 
local operations and classical communication. There is 
another reason, related to communication, for choosing 
state Eq. (|7^) as the unit of entanglement. One can show 
that the ebit is the minimal amount of entanglement that 
allows the non-local transfer of one unit of quantum in- 
formation. Such a procedure is quantum teleportation 
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of one qubit of quantum information jjjj^] . For our pur- 
poses this process can be compared to the working of a 
hypothetical quantum fax machine (see figure 0) . 
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FIG. 13. A schematic picture of quantum state teleporta- 
tion. A qubit in an unknown quantum state is entered into 
a machine which consumes one unit of entanglement (ebit) 
and a local measurement whose four possible outcomes are 
transmitted to the receiver. As a result the original state of 
the qubit is destroyed at the senders location and appears at 
the receivers end. The mathematical details can be found in 



Alice, who is very far away from Bob can transmit the 
unknown quantum state of a qubit to Bob by using this 
device. In what follows, we regard the quantum fax ma- 
chine as a black box (figure |ll|). We are not interested in 
the internal mechanism of this device nor in the proce- 
dures that Alice and Bob have to learn to make it work. 
All we are interested are the resources that this machine 
exploits and of course the result that it produces. It turns 
out that the only two resources needed to send the un- 
known quantum state of ONE qubit from Alice to Bob 
are: 

1. ONE maximally entangled pair of particles shared 
between Alice and Bob (represented by a wiggled 
line in figure |l3|). For example, Bob is holding sys- 
tem B and Alice system A and the joint state is 
| cab) in equation [76|. 

2. TWO classical bits that Alice must send to Bob 
through a classical channel like an ordinary phone 
(represented by the telephone line in figure |l3| ) . 

If these two resources are available Alice and Bob can 
successfully transmit the unknown quantum state of a 
qubit. The existence of such quantum fax machines sug- 
gests that the sending of 1 qubit can be accomplished by 
1 ebit plus 2 classical bits. 

There is an important difference between the quantum 
and classical fax machine. After Alice sends the qubit 
to Bob the state of her qubit (the original copy of the 
quantum message) gets destroyed. Only one qubit sur- 
vives the process and is in Bob's hands. Incidentally also 
the ebit that acted as a sort of quantum channel during 
the communication is destroyed. Those who were think- 
ing of buying a quantum fax machine and use it also as 
a quantum photocopier will be disappointed. The rea- 
son for this is the no cloning theorem pa discussed in 



section VD. Furthermore if we could clone we would vi- 
olate the law of the non-increase of entanglement under 
local operations that we will explore in the next few 
sections. 



B. Classical versus quantum correlations 

In the last section we mentioned that bipartite entan- 
glement is a measure of quantum correlations between 
two spatially separated parts. We now want to make 
clear what is meant by quantum and classical correla- 
tions in the context of an example. 

Consider an apparatus that generates two beams of 
light in the mixed state pab given by: 
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Pab = -\HH)(HH\ + -\VV)(VV\ 



(77) 



The notation above represents our incomplete knowledge 
of the preparation procedure, namely the fact that we 
know that the two beams were prepared either both ver- 
tically polarized or both horizontally polarized but we 
do not know which of these two alternatives occurred. 
If we perform a polarization measurement on these two 
beams by placing the polarizer along the axis of vertical 
or horizontal polarization we will find half of the time 
the two beams both polarized in the vertical direction 
and half of the time in the horizontal direction. In this 
sense, the measurement outcomes for the two beams are 
maximally correlated. We say that mixed states like 
Pab are classically correlated. The adjective classical is 
there not because the systems considered are necessarily 
classical macroscopic objects, but rather because the ori- 
gin of this correlation can be perfectly explained in terms 
of classical reasoning. It simply arises from our lack of 
complete knowledge of the preparation procedure. 

If we represent the distinguishable single beam states 

\H) and \ V) as the orthogonal column vectors 







and 



respectively, we can then write the state pab in 



matrix form following the guidance provided in equations 
H, H and HI 
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We now turn our attention to the maximally entan- 
gled state \iPab) = -^\ HH ) + -J§\ vv )- whcn the tw0 
beams are prepared in this pure state the outcomes of 
a polarization measurement along the vertical and hori- 
zontal directions are maximally correlated as in the pre- 
vious case. However, there is an important difference 
between the two. The maximally entangled state is a 
pure state. That means there is nothing more that we 
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can in principle know about it than what we can deduce 
from its wave-function. So the origin of this correlation 
is not lack of knowledge, because for a pure state we have 
complete information on the preparation procedure. The 
state \ipAB) can be represented mathematically using the 
same conventional choice of basis vectors and following 
the same hints as the density matrix: 
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A quick look at the entries of the matrix above shows 
that pab is indeed a different mathematical object than 
\^ab) (iPab\- But this mathematical difference on paper 
means nothing if wc cannot interpret it physically. In 
other words, how can you distinguish in the lab these 
two states from each other, if they seem to have the 
same measurement statistics? The answer is: turn the 
polarizer and measure again! Unfortunately, we cannot 
perform this crucial experiment in front of the reader but 
we can try to model it on paper and predict the results 
on the basis of our knowledge of measurement theory as 
developed in section III. 



For example imagine that you turn the polarizer by 
45°. Now you have two new orthogonal directions that 
you can label x and y. These new directions are analo- 
gous to the directions of horizontal and vertical polariza- 
tion considered before. 

The new polarization states can be expressed in terms 
of the old ones by using simple vector decomposition: 
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It seems natural to ask the question: are the measure- 
ment outcomes of the two beams still maximally corre- 
lated (ie. the beams are both found in either state \X) or 
state \Y)7 To answer this question, we can check whether 
there is a non- vanishing probability of finding one of the 
beams in state \X) and the other in state \Y). To do that 
we have to first construct the column vectors representing 
|X) and \Y) (see equation 
jectors \X)(X\ and \Y)(Y 



17J), then the single beam pro- 
(see equation |2tj) and finally 
the joint projector P given by |X) {X\® \Y) (Y\ (see equa- 
tion [55]) . We will not deprive the reader from the pleasure 
of explicitly constructing the 4x4 matrix representing 
P, a task well within reach if one follows the hints given 
above. Once you have P, you can calculate the probabil- 
ities of finding the two beams anti-correlated in the new 
basis (ie. when you measure with the polarizer turned 
by §) for both the classically and quantum correlated 



states (Prob p antlcorrelat?d and Probl nticorrelated ). Note 
that turning the polarizer affects the measurement not 



the preparation procedure of states pab and \iPab)(iPab\ 
that must be prepared exactly as before. By using equa- 
we then find: 
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(82) 



Prob 



anticor related 



tr{P\ip AB ){i) AB \} = 0. (83) 



The results above demonstrate that the the two states 
in eq. nil and 79 possess different forms of correlations 



which we revealed by going from the 'standard' basis to a 
rotated basis. This trick is the basis for the formulation 
of Bell inequalities ]3l| which show that a combination of 
correlations measured along different rotated axes cannot 
overcome a certain value when the state on which they 
are measured is classically correlated. If we measure the 
same set of correlations on a quantum mechanically en- 
tangled state, then this limit can be exceeded and this 
has been confirmed in experiments. 



C. How to create an entangled state? 

Another way to gain an intuitive understanding of the 
differences between quantum and classical correlations is 
to investigate the preparation procedures of states \iPab) 
and pab ■ The latter can be generated by two distant par- 
ties, Alice and Bob, who have a beam of light each and are 
allowed only 1) local operations on their own beam and 
2) classical communication via an ordinary phone. The 
entangled state instead cannot be created unless Bob and 
Alice let their beams interact. More explicitly, suppose 
that Alice and Bob are both given each one beam of light 
and are asked to create first the mixed state pab and 
then the pure entangled state \i/)ab}- What operations 
are they going to do, if they start with the same resources 
in the two cases? 

Let's first consider pab- Alice who is in London phones 
Bob who is in Boston and tells him to prepare his beam 
horizontally polarized. That amounts to sending one bit 
of classical information (ie. either H or V). Then she pre- 
pares her beam also horizontally polarized. After com- 
pleting this operation the two have constructed the prod- 
uct state \HH). Now they repeat the same procedure 
many time and each time they store their beams in two 
rooms (one in London and the other in Boston) clearly 
labeled with the SAME number (for example, "exper- 
iment 1") and with an H to indicate that the beam is 
horizontally polarized. After doing this for n times, they 
perform an analogous procedure to create \ VV) and they 
fill other n rooms carefully labeled with the same system, 
but they write V rather than H, to indicate that they 
store vertically polarized beams. Now, the two decide to 
erase the letter H or V from each room but they keep 
the labeling number. After the erasure, Alice and Bob 
have an incomplete knowledge of the state of the two 
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beams contained in each pair of room labeled with the 
same number. They know that the two beams are either 
in state \HH) or \VV) but they do not know which. The 
information the two hold on each of the pair of corre- 
lated beams contained in rooms labeled with the same 
number is correctly described by pab ■ They have in fact 
created an ensemble of pair of beams in state pab by 
acting locally and just using phone calls. The example 
above is a bit of a "theorist's description of what is going 
on in the lab". The example captures the crucial fact 
that classical correlations arise from 1) local manipula- 
tions of the quantum states and 2) erasure of information 
that in principle is available to some more knowledgeable 
observers. 

The situation is very different when Alice and Bob 
want to create an entangled state and they start with two 
completely disentangled product states like one beam in 
Boston and another independent one in London. In this 
situation, one of the two has to take the plane and bring 
his or her beam to interact with the other. Only at that 
point can entanglement be created. In fact, one of the 
basic results of quantum information theory is that the 
net amount of entanglement in a system cannot be in- 
creased by using classical communication and local oper- 
ations only. So, if Alice and Bob start with no entangle- 
ment at all, then they are forced to bring the two beams 
together and let them interact in order to create entan- 
glement. We now would like to illustrate an example of 
two beams that are initially in a disentangled state and 
become entangled by interacting with each other. Sup- 
pose that Alice and Bob hold a beam each polarized at 
an angle \ (see equation |8(]) . The two beams are ini- 
tially far away from each other so they are not interact- 
ing. The joint system can be described mathematically 
by the product state \iPab(0)) given below: 
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The basis vectors used to write the Hamiltonian H are 
the same used to write! ^^(O)) in equation ^J. Since 
the matrix in equation 85 is diagonal, we can read out the 



eigenstates and eigenvalues of the Hamiltonian. They are 
the state vectors \HH), \HV), \VH) and \VV) and the 
corresponding eigenvalues are equal to 1, 1, 1 and — 1. 

We can now write down the time evolution of the state 
IV'ab(O)) by solving the Schrodinger equation with the 
Hamiltonian H: 



(86) 



The Schrodinger equation above is really a set of four 
linear differential equations one for each component of 
the four dimensional vector representing ipAB(t). Usu- 
ally, these four differential equations would be coupled 
by the Hamiltonian so you would have to diagonalize the 
corresponding matrix. In this case however the Hamilto- 
nian is already diagonal so we can redily write the solu- 
tion of this set of equations in vector form as: 



^AB(i) =exp(— lft)^ B (0). 



(87) 



The exponential of the Hamiltonian exp(^ Ht) is the 
diagonal matrix whose eigenvalues are the exponential of 
the eigenvalues of the Hamiltonian's matrix (see equation 
fhi| and discussion below). The reader can also check that 
this time evolution matrix is unitary. After time t = ^ 
(never mind the units) the matrix can be written as: 



exp(— Ht) = 
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The two beams in the product state |'0as(O)) are 
brought together and they start interacting with each 
other. The time evolution of the original state is deter- 
mined by the joint Hamiltonian of the system H that 
is represented mathematically by a 4 x 4 hermitian ma- 
trix because it has to operate on vectors in the enlarged 
Hilbert space. Let us pick up an Hamiltonian of this 
type, something easy so the calculation does not get too 
complicated and let us see what happens. 



According to equation you can now write down 
the vector ipABif) just by multiplying the unitary matrix 
in equation ^8] times the column vector ipAB (0) given in 
equation 

vector representing the system is: 



The result is that after time the state 



\^AB{t)) 




(89) 



H 



( 1 
10 
10 

Vo o o -l 



(85) 



You can check by inspection that the state in equation 
|8^ is entangled (ie. it cannot be factorized). The more 
ambitious reader may consult reference [ |T7[ that explains 
in simple terms the systematic criteria to check whether 
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the state of a bipartite system is entangled or not in the 
context of this example. 

Whatever way you choose to convince yourself that 
the state above is entangled the conclusion is the same. 
States that can be factorized arise mathematically only 
for very special choices of the entries of the corresponding 
vectors. Under Hamiltonian evolution the value of these 
entries will change and in general it will not be possi- 
ble to factorize the state any more. The discussion above 
shows that the process by which two independent systems 
in a product state like |^as(0)) get entangled is indeed 
quite natural provided that the two systems are brought 
together and left to interact with each other. However, 
most interaction will not lead to a maximally entangled 
state. It is therefore important for applications like tele- 
portation to devise techniques by which one can distill a 
set of ebits from an ensemble of partially entangled states 
like \ipAB(t)) in equation This is the subject of the 
next section. 



D. Entanglement distillation 

We emphasize that the fundamental law of quantum 
information processing does not rule out the possibility to 
occasionally increase the net amount of entanglement in 
a system by using local operations and classical commu- 
nication only, provided that on average the net amount 
of entanglement is not increased. This implies that it 
should be possible to devise strategies to turn a partially 
entangled pair of particles into an ebit provided that this 
strategy sometimes leads to an increase and other times 
to a loss of entanglement so that on average the "en- 
tanglement balance" stays the same. We first consider a 
simple example of entanglement distillation and then we 
look at the efficiency of a general distillation procedure 
by using Landauer's principle. 



their own side and let them interact with the entangled 
particle they are holding and perform measurements on 
them. We now describe what operations the two perform 
in order to distill one ebit. 

1. Alice adds another particle in state \0 A ) on her side. 
Note that the subscript A denotes particles on Al- 
ice's side and B on Bob's side. Now the joint state 
of the entangled pair plus the extra particle is given 
by the product state \iptot) given below: 

Wtot) = |0 A )®(a|0 A )|0 B )+/3|U)|l B )). (90) 

We can collect the states of the two particles on Al- 
ice side in the same four dimensional column vector 
and rewrite equation ^ as: 



| i> tot ) = a\Wt) A \Q B ) + /3|01)x|li 



(91) 



2. Now Alice performs a unitary transformation U on 
her two particles. As we mentioned in the previ- 
ous section, a unitary transformation can be im- 
plemented by letting the joint system evolve for a 
certain time as dictated by a suitably chosen Hamil- 
tonian (see example in equation p8| ). The unitary 
transformation U that Alice needs to implement on 
the joint state of her two particles is given below in 
matrix form: 



U 



\ 







a 

o 

l 

a- 

o 



o\ 




1/ 



(92) 



The reader can check that, when the unitary trans- 
formation is applied on her states |00)yi and 101)^4, 
Alice achieves the following: 



1. A simple example 

Alice is still in London and Bob in Boston. They share 
a non maximally entangled pair of particles in the state 
\iPab) — a|00) where a ^ (3. They want to turn 

it into an ebit but they are only allowed to act locally on 
their own particle but not to let the two interact. Fur- 
thermore, their communication must be limited to classi- 
cal bits sent over an ordinary channel, nothing fancy like 
sending or teleporting quantum states is allowed. The 
reason why we demand such tough conditions on Bob 
and Alice and we insist on them not to freely meet up is 
because we want to investigate the issue of locality ver- 
sus non-locality. This is really the main theme behind 
our study of entanglement, so we have to be extra care- 
ful in keeping track of what they do. That still leaves a 
lot of room for manipulation on both Alice's and Bob's 
side. For example the two can add other particles on 



U\00) A = ^\00) A 
a 

U\01) A = \01) A . 



^Jo? - P 2 



|io>, 



(93) 



Hence, when the unitary transformation U is ap- 
plied to the joint state of the three particles \iptot), 
the state of the particle on Bob's side is unaf- 
fected whereas the state of the two on Alice's side 
is changed according to equation |93|: 



U^tot) = /?|00)a|0i 



P 2 \10) a \0 b )+/3\01} a \1b}. (94) 



We can split Alice's vector states in equation |94| 
and isolate the state of the entangled pair from the 
state of the particle added on Alice's side by writing 
the latter first in the equation below: 
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+y^^|i A )|o A )|o s > 



(95) 



3. Now, Alice decides to perform a measurement on 
the extra particle she is holding on her side. She 
chooses the observable that has |0) and |1) as its 
eigenstates. There are two possible scenarios: 

a) Alice finds the extra particle in state |0). Then 
the total state is \0) A ® ^(|0a0b) + |1a1b})- Alice 
and Bob share a maximally entangled state. This 
event occurs with probability 20 1 . 

b) Alice finds the extra particle in state |1). Then 
the total state is \1a) (g> |0a0_b). The procedure was 
unsuccessful and the two lost their initial entan- 
glement. This possibility occurs with probability 
1-2/3 2 . 

4. Alice phones Bob and informs him of the measure- 
ment outcomes. If the procedure is successful Bob 
holds his particle otherwise they try again. 

A question that arises naturally in this context is the 
following: what is the maximum number of ebits that 
Alice and Bob can extract from a large ensemble of N 
non maximally entangled states? We will answer this 
question by using Landauer's erasure principle. 



2. Efficiency of entanglement distillation from Landauer's 
principle 

We start by considering an example of a process that 
will cause two systems to become entangled: a quantum 
measurement. A quantum measurement is a process by 
which the apparatus and the system interact with each 
other so that correlations are created between the states 
of the two. These correlations are a measure of the in- 
formation that an observer acquires on the state of the 
system if he knows the state of the apparatus. 

Consider an ensemble of systems S on which we want 
to perform measurements using apparatus A. A general 
way to write the state of S is 



IV>s 



1 N 
v i—i 



(96) 



where {|sj)} is an orthogonal basis. In our previous ex- 
ample, the orthogonal basis was given by the vertically 
and horizontally polarized states. When the apparatus 
is brought into contact with the system the joint state of 
S and A is given by 



VP. 



S+A) 



1 N 

t 5! 



The result of the act of measurement is to create cor- 
relations (ie. entanglement) between the apparatus and 
the system. The equation above is a generalization of 
equation (76). 

An observation is said to be imperfect when it is unable 
to distinguish between two different outcomes of a mea- 
surement. Let A be an imperfect measuring apparatus 
so that {|aj)} is NOT an orthogonal set. A consequence 
of the non-orthogonality of the states |<Zj) is that we are 
unable to distinguish with certainty the correlated states 
| s^. There is no maximal correlation between the state 
of the system and the apparatus, which means that 5* and 
A are not maximally entangled). However, suppose that 
by acting locally on the apparatus we can transform the 
whole state into the maximally entangled state 

|0S+a): 



1 N 

\^ S+ A) = ^= y £\8 i )\b. 



(98) 



where {\bi}} IS an orthogonal set. This does not increase 
the information between the apparatus and the system 
since we are not interacting with the system at all. In 
order to assess the efficiency of this distillation procedure 
we need to find the probability with which we can distill 
successfully. 

The state of the apparatus only, after the correlations 
are created, is given by the red uced density operator first 
encountered in section [II C 5 : 



trs(\i>s+A) (iPs+a\) = Pa- 



(99) 



(97) 



Landauer's principle states that to erase the information 
contained in the apparatus we need to generate in the en- 
vironment an entropy of erasure larger than S{pa) and 
this has to be greater than or equal to the information 
gain. After we purify the state to \(/)s+a) with a prob- 
ability p , we gain p log TV bits of information about the 
system. In fact, since we have maximal correlations now, 
the result of a measurement enables us to distinguish be- 
tween N equally likely outputs. The rest of the state 
contains no information because it is completely disen- 
tangled and therefore there are no correlations between 
the states of the system and the apparatus. After read- 
ing the state of the apparatus we will not gain any useful 
knowledge on the state of the system. 

By Landauer's principle, the entropy of erasure is 
greater than or equal to the information gain before pu- 
rification and this is in turn greater than or equal to the 
information the observer has after purification, because 
the apparatus is not interacting with the system so the 
information can not increase. We thus write 

S(p A ) >plogN. (100) 

The upper bound to purification efficiency is therefore 

p<S{ PA )/logN. (101) 
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This bound obtained from Landauer's principle is actu- 
ally achievable as has been proven in [fj2| by construction 
of an explicit procedure that achieves it. It is neverthe- 
less satisfying that Landauer's principle is able to give a 
sharp upper bound with a minimal amount of technicali- 
ties and by doing so it provides an informal argument for 
using the Von Neumann entropy as a measure of bipar- 
tite entanglement. With this result we answer the last of 
the three questions posed in the introduction that have 
served as guidelines for our exploration of the physical 
theory of information. 



VII. CONCLUSION 

This is really the end of our long investigation on the 
properties of entanglement, classical and quantum infor- 
mation. We hope to have reasonably delivered what we 
promised in the introduction. Throughout the paper, 
we used the pedagogical technique of going backwards 
and forward among different aspects of the subject, each 
time increasing the level of sophistication of the ideas 
and mathematical tools employed. This method has the 
advantage of allowing enough time for " different layers of 
knowledge to sediment in the mind of the reader" . Un- 
fortunately, there is also the inevitable side effect that a 
proper understanding of the subject matter will only fol- 
low when the reader goes through the material more than 
once. For example, the understanding of the differences 
between quantum and classical information crucially re- 
lies on the appreciation of the concepts of classical and 
quantum correlations that were explicitly studied only at 
the end of the article. No matter how hard we tried to ar- 
gue with words previously, a proper grasp of these topics 
came only after employing more advanced mathematical 
tools developed in later parts of the paper. 

To prevent the reader from feeling lost, we will now 
attempt to recap the content of the paper. In the first 
part, the scene was dominated by the Shannon entropy 
that helped us to define and evaluate the amount of clas- 
sical information encoded in a classical object or mes- 
sage. We were also able to find a bound on the classical 
information capacity of a noisy classical channel by using 
Landauer's principle. The answer depended once again 
on the Shannon entropy. Following a brief recap of quan- 
tum mechanics, our interest slightly shifted to quantify- 
ing the amount of classical information encoded in quan- 
tum systems. This was achieved by introducing the Von 
Neumann entropy. After developing a suitable thermal- 
ization procedures to erase information from quantum 
systems, we managed to employ Landauer's principle to 
justify the Holevo bound. This bound expresses the clas- 
sical information capacity of a noisy quantum channel in 
terms of the Von Neumann entropy. That completed our 
investigation of classical information. 

We then turned our attention to quantifying the 
amount of quantum information encoded in a quantum 



object or message. This result, which is based on quan- 
tum data compression, was obtained employing Lan- 
dauer's principle and provided a solid basis for the intro- 
duction of the qubit as the fundamental unit of quantum 
information. The answer to this question was once again 
given by the von Neumann entropy. Quantum informa- 
tion can be compressed, but unlike classical information, 
it cannot be copied. This was our conclusion after study- 
ing the no-cloning theorem with the help of Landauer's 
erasure principle. 

Motivated by these successes we tried to shed light on 
the phenomena of entanglement using Landauer's prin- 
ciple. We explained that creating a pair of entangled 
states is not difficult after all. Any two systems initially 
uncorrelated will get entangled just by interacting with 
each other. However, it is not equally easy to create 
quantum states that are maximally entangled over large 
distance. This problem can be overcome by designing 
suitable distillation procedures by which maximally en- 
tangled states, ebits, are produced from an ensemble of 
non-maximally entangled states without increasing the 
total amount of entanglement. To some extent this pro- 
cedure provides a way to measure the amount of entan- 
glement (in ebits) contained in a system composed of only 
two parts. The efficiency of a distillation procedure was 
once again expressed in terms of the von Neumann en- 
tropy after carrying out a simple analysis based on Lan- 
dauer's principle. The von Neumann entropy in quantum 
information theory is so widespread to justify the claim 
that the whole field is really about its use and interpre- 
tation , as classical information theory was based on the 
Shannon entropy. E^j. 

After reading this summary you might have noticed 
two glaring omissions in our treatment. Firstly, we spent 
a lot of time discussing the classical information capacity 
of a noisy classical and quantum channel, but we never 
mentioned the more interesting problem of the quantum 
information capacity of a noisy quantum channel. In 
other words how many qubits can you send through a 
noisy channel when the letters of your message are en- 
coded in arbitrary quantum states? Secondly, we never 
mentioned how to generalize our discussion of entangle- 
ment measure to the useful and interesting case of entan- 
gled states composed of more than two particles. 

We reassure the reader that these omissions are not 
motivated by our compelling desire to meet the deadline 
for submission of this paper, but rather by the fact that 
nobody really knows the answer to these fundamental 
and natural questions. We do not know whether one can 
push Landauer's principle to investigate these problems. 
Landauer' principle is somehow limited to the erasure 
of classical information whereas the questions above are 
completely quantum. However, Landauer' principle can 
be used to yield upper bounds to entanglement distilla- 
tion a completely non-classical procedure. Therefore the 
hope that Landauer's principle can shed some light on 
these unsolved problems may not remain unfulfilled. 
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Anyway, these final remarks prove the point that, al- 
though a large amount of work has been published since 
Shannon, there is still room for further research in the 
foundations of information theory. It is also evident that 
this research belongs to fundamental physics as much as 
it does to engineering. If you found some of the ideas in 
this paper fascinating and you wish to start working in 
the field, you may want to start by studying some further 
introductory texts such as p3|- |l7| , [i9|pO| . Perhaps some- 
day, we will find out the answer to the questions above 
from you. 
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